Lagrangian Multiplier#

Naive Lagrangian Multiplier#

class safepo.common.lagrange.Lagrange(cost_limit: float, lagrangian_multiplier_init: float, lagrangian_multiplier_lr: float, lagrangian_upper_bound: float | None = None)#

Lagrange multiplier for constrained optimization.

Parameters:
  • cost_limit – the cost limit

  • lagrangian_multiplier_init – the initial value of the lagrangian multiplier

  • lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier

  • lagrangian_upper_bound – the upper bound of the lagrangian multiplier

Variables:
  • cost_limit – the cost limit

  • lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier

  • lagrangian_upper_bound – the upper bound of the lagrangian multiplier

  • _lagrangian_multiplier – the lagrangian multiplier

  • lambda_range_projection – the projection function of the lagrangian multiplier

  • lambda_optimizer – the optimizer of the lagrangian multiplier

Initialize an instance of Lagrange.

compute_lambda_loss(mean_ep_cost: float) Tensor#

Compute the loss of the lagrangian multiplier.

Parameters:

mean_ep_cost – the mean episode cost

Returns:

the loss of the lagrangian multiplier

property lagrangian_multiplier: Tensor#

The lagrangian multiplier.

Returns:

the lagrangian multiplier

update_lagrange_multiplier(Jc: float) None#

Update the lagrangian multiplier.

Parameters:

Jc – the mean episode cost

Returns:

the loss of the lagrangian multiplier

PID Lagrangian Multiplier#

class safepo.common.lagrange.PIDLagrangian(cost_limit: float, lagrangian_multiplier_init: float = 0.005, pid_kp: float = 0.1, pid_ki: float = 0.01, pid_kd: float = 0.01, pid_d_delay: int = 10, pid_delta_p_ema_alpha: float = 0.95, pid_delta_d_ema_alpha: float = 0.95, sum_norm: bool = True, diff_norm: bool = False, penalty_max: int = 100.0)#

PID Lagrangian multiplier for constrained optimization.

Parameters:
  • cost_limit – the cost limit

  • lagrangian_multiplier_init – the initial value of the lagrangian multiplier

  • pid_kp – the proportional gain of the PID controller

  • pid_ki – the integral gain of the PID controller

  • pid_kd – the derivative gain of the PID controller

  • pid_d_delay – the delay of the derivative term

  • pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p

  • pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d

  • sum_norm – whether to normalize the sum of the PID output

  • diff_norm – whether to normalize the difference of the PID output

  • penalty_max – the maximum value of the penalty

Variables:
  • cost_limit – the cost limit

  • lagrangian_multiplier_init – the initial value of the lagrangian multiplier

  • pid_kp – the proportional gain of the PID controller

  • pid_ki – the integral gain of the PID controller

  • pid_kd – the derivative gain of the PID controller

  • pid_d_delay – the delay of the derivative term

  • pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p

  • pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d

  • sum_norm – whether to normalize the sum of the PID output

  • diff_norm – whether to normalize the difference of the PID output

  • penalty_max – the maximum value of the penalty

References

  • Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods

  • Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel.

  • URL: CPPOPID

Initialize an instance of PIDLagrangian.

property lagrangian_multiplier: float#

The lagrangian multiplier.