Training Curves#

Safe reinforcement learning algorithms are designed to achieve high reward while satisfying the safety constraint. In this section, we evaluate the performance of SafePO’s algorithms on the various environments in Safety-Gymnasium.

Single-Agent#

First order#

Second order#

Multi-Agent#