Lagrangian Multiplier#
Naive Lagrangian Multiplier#
- class safepo.common.lagrange.Lagrange(cost_limit: float, lagrangian_multiplier_init: float, lagrangian_multiplier_lr: float, lagrangian_upper_bound: float | None = None)#
Lagrange multiplier for constrained optimization.
- Parameters:
cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier
lagrangian_upper_bound – the upper bound of the lagrangian multiplier
- Variables:
cost_limit – the cost limit
lagrangian_multiplier_lr – the learning rate of the lagrangian multiplier
lagrangian_upper_bound – the upper bound of the lagrangian multiplier
_lagrangian_multiplier – the lagrangian multiplier
lambda_range_projection – the projection function of the lagrangian multiplier
lambda_optimizer – the optimizer of the lagrangian multiplier
Initialize an instance of
Lagrange
.- compute_lambda_loss(mean_ep_cost: float) Tensor #
Compute the loss of the lagrangian multiplier.
- Parameters:
mean_ep_cost – the mean episode cost
- Returns:
the loss of the lagrangian multiplier
- property lagrangian_multiplier: Tensor#
The lagrangian multiplier.
- Returns:
the lagrangian multiplier
- update_lagrange_multiplier(Jc: float) None #
Update the lagrangian multiplier.
- Parameters:
Jc – the mean episode cost
- Returns:
the loss of the lagrangian multiplier
PID Lagrangian Multiplier#
- class safepo.common.lagrange.PIDLagrangian(cost_limit: float, lagrangian_multiplier_init: float = 0.005, pid_kp: float = 0.1, pid_ki: float = 0.01, pid_kd: float = 0.01, pid_d_delay: int = 10, pid_delta_p_ema_alpha: float = 0.95, pid_delta_d_ema_alpha: float = 0.95, sum_norm: bool = True, diff_norm: bool = False, penalty_max: int = 100.0)#
PID Lagrangian multiplier for constrained optimization.
- Parameters:
cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
pid_kp – the proportional gain of the PID controller
pid_ki – the integral gain of the PID controller
pid_kd – the derivative gain of the PID controller
pid_d_delay – the delay of the derivative term
pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p
pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d
sum_norm – whether to normalize the sum of the PID output
diff_norm – whether to normalize the difference of the PID output
penalty_max – the maximum value of the penalty
- Variables:
cost_limit – the cost limit
lagrangian_multiplier_init – the initial value of the lagrangian multiplier
pid_kp – the proportional gain of the PID controller
pid_ki – the integral gain of the PID controller
pid_kd – the derivative gain of the PID controller
pid_d_delay – the delay of the derivative term
pid_delta_p_ema_alpha – the exponential moving average alpha of the delta_p
pid_delta_d_ema_alpha – the exponential moving average alpha of the delta_d
sum_norm – whether to normalize the sum of the PID output
diff_norm – whether to normalize the difference of the PID output
penalty_max – the maximum value of the penalty
References
Title: Responsive Safety in Reinforcement Learning by PID Lagrangian Methods
Authors: Adam Stooke, Joshua Achiam, Pieter Abbeel.
URL: CPPOPID
Initialize an instance of
PIDLagrangian
.- property lagrangian_multiplier: float#
The lagrangian multiplier.