Hide navigation sidebar

Hide table of contents sidebar

Toggle site navigation sidebar

Toggle table of contents sidebar

Safe Policy Optimization Documentation

Usage

Algorithms Training
Evaluating Trained Models
Benchmarking Tools
Customization of Algorithms
Efficient Commands

API

Logger
Buffer
Model
Lagrangian Multiplier
Environment Maker

ALGORITHMS

Training Curves
Lagrangian Methods
First Order Projection Methods
Trustworthy Implementation

Back to top

Edit this page

Toggle table of contents sidebar

Training Curves#

Safe reinforcement learning algorithms are designed to achieve high reward while satisfying the safety constraint. In this section, we evaluate the performance of SafePO’s algorithms on the various environments in Safety-Gymnasium.

Single-Agent#

First order#

CUP

CPPOPID

FOCOPS

PPO-Lag

Second order#

CPO

PCPO

RCPO

TRPO-Lag

Multi-Agent#

HAPPO

MACPO

MAPPO

MAPPO-Lag

Made with Sphinx and @pradyunsg's Furo

On this page

Training Curves
- Single-Agent
  - First order
  - Second order
- Multi-Agent