Trustworthy Implementation#
To ensure that SafePO’s implementation is trustworthy, we have compared
our algorithms’ performance with open source implementations of the same algorithms.
As some of the algorithms can not be found in open source, we selected
PPO-Lag
, TRPO-Lag
, CPO
and FOCOPS
for comparison.
We have compared the following algorithms:
TRPO-Lag
: OpenAI Baselines: Safety Starter Agents, RL Safety AlgorithmsCPO
: OpenAI Baselines: Safety Starter Agents, RL Safety AlgorithmsFOCOPS
: Original Implementation
We compared those algorithms in tasks from Safety-Gymnasium,
Warning
It may takes some time to load the results. If you can not see the results, please directly visit wandb.ai.
The results are shown as follows.