Algorithms Training#

SafePO use single file to implement all algorithms. The single-agent algorithms are in safepo/single)agent folder while the multi-agent algorithms are in safepo/multi_agent folder.

Single-agent Algorithms#

To run the algorithms with default configuration, you need to specify the environment name. For example, to run the PPOLag algorithm in the SafetyPointGoal1-v0 environment, you can run the following command:

cd safepo/single_agent
python ppo_lag.py   --task SafetyPointGoal1-v0   --experiment ppo_lag_exp

Then you can check the results in the runs/ppo_lag_exp folder.

Multi-agent Algorithms#

The multi-agent algorithms running is similar to the single-agent algorithms. For example, to run the MAPPOLag algorithm in the Safety2x4AntVelocity environment, you can run the following command:

cd safepo/multi_agent
python mappolag.py  --task Safety2x4AntVelocity-v0  --experiment mappo_lag_exp

Then you can check the results in the runs/mappo_lag_exp folder.

Customizing Training#

We use command line interface to support training customization. We provide the detailed description of the command line arguments in the following

Single-agent Algorithms

Argument	Description	Default Value
seed	Seed of the experiment	0
device	Device to run the code	“cpu”
num-envs	Number of parallel game environments	10
total-steps	Total steps of the experiments	10000000
task	ID of the environment	“SafetyPointGoal1-v0”
use-eval	Toggles evaluation	False
steps-per-epoch	Number of steps to run in each environment per policy rollout	20000
critic-lr	Learning rate of the critic network	1e-3
log-dir	Directory to save agent logs	“../runs”
experiment	Name of the experiment	“single_agent_experiment”
write-terminal	Toggles terminal logging	True
use-tensorboard	Toggles tensorboard logging	False

Multi-agent Algorithms

Argument	Description	Default Value
use-eval	Use evaluation environment for testing	False
task	The task to run	“Safety2x4AntVelocity-v0”
experiment	Experiment name If used with metadata flag, additional information about physics engine, sim device, pipeline and domain randomization will be added to the name	“multi_agent_experiment”
seed	Random seed	0
model-dir	The model dir	“”
safety-bound	Cost_limit	25.0
device	The device to run the model on	“cpu”
device-id	The device id to run the model on	0
write-terminal	Toggles terminal logging	True
headless	Toggles headless mode	False
total-steps	Total steps of the experiments	None
num-envs	The number of parallel game environments	None