Trustworthy Implementation#

To ensure that SafePO’s implementation is trustworthy, we have compared our algorithms’ performance with open source implementations of the same algorithms. As some of the algorithms can not be found in open source, we selected PPO-Lag, TRPO-Lag, CPO and FOCOPS for comparison.

We have compared the following algorithms:

We compared those algorithms in tasks from Safety-Gymnasium,

Warning

It may takes some time to load the results. If you can not see the results, please directly visit wandb.ai.

The results are shown as follows.