Algorithms Training#

SafePO use single file to implement all algorithms. The single-agent algorithms are in safepo/single)agent folder while the multi-agent algorithms are in safepo/multi_agent folder.

Single-agent Algorithms#

To run the algorithms with default configuration, you need to specify the environment name. For example, to run the PPOLag algorithm in the SafetyPointGoal1-v0 environment, you can run the following command:

cd safepo/single_agent
python ppo_lag.py   --task SafetyPointGoal1-v0   --experiment ppo_lag_exp

Then you can check the results in the runs/ppo_lag_exp folder.

Multi-agent Algorithms#

The multi-agent algorithms running is similar to the single-agent algorithms. For example, to run the MAPPOLag algorithm in the Safety2x4AntVelocity environment, you can run the following command:

cd safepo/multi_agent
python mappolag.py  --task Safety2x4AntVelocity-v0  --experiment mappo_lag_exp

Then you can check the results in the runs/mappo_lag_exp folder.

Customizing Training#

We use command line interface to support training customization. We provide the detailed description of the command line arguments in the following

Argument

Description

Default Value

seed

Seed of the experiment

0

device

Device to run the code

“cpu”

num-envs

Number of parallel game environments

10

total-steps

Total steps of the experiments

10000000

task

ID of the environment

“SafetyPointGoal1-v0”

use-eval

Toggles evaluation

False

steps-per-epoch

Number of steps to run in each environment per policy rollout

20000

critic-lr

Learning rate of the critic network

1e-3

log-dir

Directory to save agent logs

“../runs”

experiment

Name of the experiment

“single_agent_experiment”

write-terminal

Toggles terminal logging

True

use-tensorboard

Toggles tensorboard logging

False

Argument

Description

Default Value

use-eval

Use evaluation environment for testing

False

task

The task to run

“Safety2x4AntVelocity-v0”

experiment

Experiment name If used with metadata flag, additional information about physics engine, sim device, pipeline and domain randomization will be added to the name

“multi_agent_experiment”

seed

Random seed

0

model-dir

The model dir

“”

safety-bound

Cost_limit

25.0

device

The device to run the model on

“cpu”

device-id

The device id to run the model on

0

write-terminal

Toggles terminal logging

True

headless

Toggles headless mode

False

total-steps

Total steps of the experiments

None

num-envs

The number of parallel game environments

None