Algorithms Training#
SafePO use single file to implement all algorithms. The single-agent algorithms are in safepo/single)agent
folder while the multi-agent algorithms are in safepo/multi_agent
folder.
Single-agent Algorithms#
To run the algorithms with default configuration, you need to specify the environment name. For example, to run the PPOLag
algorithm in the SafetyPointGoal1-v0
environment, you can run the following command:
cd safepo/single_agent
python ppo_lag.py --task SafetyPointGoal1-v0 --experiment ppo_lag_exp
Then you can check the results in the runs/ppo_lag_exp
folder.
Multi-agent Algorithms#
The multi-agent algorithms running is similar to the single-agent algorithms. For example, to run the MAPPOLag
algorithm in the Safety2x4AntVelocity
environment, you can run the following command:
cd safepo/multi_agent
python mappolag.py --task Safety2x4AntVelocity-v0 --experiment mappo_lag_exp
Then you can check the results in the runs/mappo_lag_exp
folder.
Customizing Training#
We use command line interface to support training customization. We provide the detailed description of the command line arguments in the following
Argument |
Description |
Default Value |
---|---|---|
seed |
Seed of the experiment |
0 |
device |
Device to run the code |
“cpu” |
num-envs |
Number of parallel game environments |
10 |
total-steps |
Total steps of the experiments |
10000000 |
task |
ID of the environment |
“SafetyPointGoal1-v0” |
use-eval |
Toggles evaluation |
False |
steps-per-epoch |
Number of steps to run in each environment per policy rollout |
20000 |
critic-lr |
Learning rate of the critic network |
1e-3 |
log-dir |
Directory to save agent logs |
“../runs” |
experiment |
Name of the experiment |
“single_agent_experiment” |
write-terminal |
Toggles terminal logging |
True |
use-tensorboard |
Toggles tensorboard logging |
False |
Argument |
Description |
Default Value |
---|---|---|
use-eval |
Use evaluation environment for testing |
False |
task |
The task to run |
“Safety2x4AntVelocity-v0” |
experiment |
Experiment name If used with metadata flag, additional information about physics engine, sim device, pipeline and domain randomization will be added to the name |
“multi_agent_experiment” |
seed |
Random seed |
0 |
model-dir |
The model dir |
“” |
safety-bound |
Cost_limit |
25.0 |
device |
The device to run the model on |
“cpu” |
device-id |
The device id to run the model on |
0 |
write-terminal |
Toggles terminal logging |
True |
headless |
Toggles headless mode |
False |
total-steps |
Total steps of the experiments |
None |
num-envs |
The number of parallel game environments |
None |