Lagrangian Multiplier#

Naive Lagrangian Multiplier#

class safepo.common.lagrange.Lagrange(cost_limit: float, lagrangian_multiplier_init: float, lagrangian_multiplier_lr: float, lagrangian_upper_bound: float | None = None)#

Lagrange multiplier for constrained optimization.

Parameters:

cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier
lagrangian_upper_bound – the upper bound of the lagrangian multiplier

Variables:

cost_limit – the cost limit
lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier
lagrangian_upper_bound – the upper bound of the lagrangian multiplier
_lagrangian_multiplier – the lagrangian multiplier
lambda_range_projection – the projection function of the lagrangian multiplier
lambda_optimizer – the optimizer of the lagrangian multiplier

Initialize an instance of Lagrange.

compute_lambda_loss(mean_ep_cost: float) → Tensor#

Compute the loss of the lagrangian multiplier.

Parameters:: mean_ep_cost – the mean episode cost
Returns:: the loss of the lagrangian multiplier

property lagrangian_multiplier: Tensor#

The lagrangian multiplier.

Returns:: the lagrangian multiplier

update_lagrange_multiplier(Jc: float) → None#

Update the lagrangian multiplier.

Parameters:: Jc – the mean episode cost
Returns:: the loss of the lagrangian multiplier

PID Lagrangian Multiplier#

class safepo.common.lagrange.PIDLagrangian(cost_limit: float, lagrangian_multiplier_init: float = 0.005, pid_kp: float = 0.1, pid_ki: float = 0.01, pid_kd: float = 0.01, pid_d_delay: int = 10, pid_delta_p_ema_alpha: float = 0.95, pid_delta_d_ema_alpha: float = 0.95, sum_norm: bool = True, diff_norm: bool = False, penalty_max: int = 100.0)#

PID Lagrangian multiplier for constrained optimization.

Parameters:

cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
pid_kp – the proportional gain of the PID controller
pid_ki – the integral gain of the PID controller
pid_kd – the derivative gain of the PID controller
pid_d_delay – the delay of the derivative term
pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p
pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d
sum_norm – whether to normalize the sum of the PID output
diff_norm – whether to normalize the difference of the PID output
penalty_max – the maximum value of the penalty

Variables:

cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
pid_kp – the proportional gain of the PID controller
pid_ki – the integral gain of the PID controller
pid_kd – the derivative gain of the PID controller
pid_d_delay – the delay of the derivative term
pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p
pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d
sum_norm – whether to normalize the sum of the PID output
diff_norm – whether to normalize the difference of the PID output
penalty_max – the maximum value of the penalty

References

Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel.
URL: CPPOPID

Initialize an instance of PIDLagrangian.

property lagrangian_multiplier: float#: The lagrangian multiplier.