Trustworthy Implementation#

To ensure that SafePO’s implementation is trustworthy, we have compared our algorithms’ performance with open source implementations of the same algorithms. As some of the algorithms can not be found in open source, we selected PPO-Lag, TRPO-Lag, CPO and FOCOPS for comparison.

We have compared the following algorithms:

PPO-Lag: OpenAI Baselines: Safety Starter Agents
TRPO-Lag: OpenAI Baselines: Safety Starter Agents, RL Safety Algorithms
CPO: OpenAI Baselines: Safety Starter Agents, RL Safety Algorithms
FOCOPS: Original Implementation

We compared those algorithms in tasks from Safety-Gymnasium,

Warning

It may takes some time to load the results. If you can not see the results, please directly visit wandb.ai.

The results are shown as follows.

PPO-Lag

TRPO-Lag

CPO

FOCOPS