Evaluating Trained Models#

Model Evaluation#

To evaluate the trained model, you can run:

cd safepo/
python evaluate.py --benchmark-dir ./runs/ppo_lag_exp --eval-episodes 10 --save-dir ./results/ppo_lag_exp

This will evaluate the model in the last checkpoint of the training, and save the evaluation results in safepo/results/ppo_lag_exp.

Training Curve Plotter#

Training curves reveal the episodic reward and cost overtime, which is useful to evaluate the performance of the algorithms.

suppose you have ran the training script in algorithms training and saved the training log in safepo/runs/ppo_lag_exp, then you can plot the training curve by running:

cd safepo/
python plot.py --logdir ./runs/ppo_lag_exp

Note

This plotter is also suitable for multi-agent algorithms plotting. However, in experiment we found that the cost value training curve of multi-agent safe and unsafe algorithms are largely different, which makes the plot not very clear. So we recommend to plot the multi-agent training curve by running the plotter in safepo/multi_agent/plot_for_benchmark.

cd safepo/multi_agent
python plot_for_benchmark.py --logdir ./runs/mappo_lag_exp

Danger

Make sure you have ran at least one unsafe multi-agent algorithm and one safe multi-agent algorithm, otherwise the plotter will raise error.

You can find the plot in safepo/results/ppo_lag_exp/.