Cooperative Cuisine can be used to train a reinforcment learning agent. In this implementation [stable_baselines](https://github.com/hill-a/stable-baselines)is used to load the rl algorithm
Cooperative Cuisine can be used to train a reinforcment learning agent. In this implementation,[stable_baselines](https://github.com/hill-a/stable-baselines) rl algorithm are used.
@@ -48,7 +48,7 @@ The layout files for the project are stored in the `cooperative_cuisine/config`
### Using Overcooked-AI Levels and Configs in Cooperative Cuisine
All layouts from **Overcooked-AI** can be used within Cooperative Cuisine. Dedicated configs are defined and can be loaded via Hydra. To use Overcooked-AI layouts:
All layouts from [**Overcooked-AI**](https://github.com/HumanCompatibleAI/overcooked_ai) can be used within Cooperative Cuisine. Dedicated configs are defined and can be loaded via Hydra. To use Overcooked-AI layouts:
1. Set the [`overcooked-ai_environment_config.yaml`](./config/environment/overcooked-ai_environment_config.yaml) as the environment config.
2. Define any layout from Overcooked-AI under `layout_name`.
...
...
@@ -101,7 +101,7 @@ The cutting board presents a major challenge for the agent, especially when mult
PPO can be unstable, showing good progress and then plateauing. A recommended game time limit is between `150-300` seconds, depending on the complexity of the task. For faster training, a lower time limit can be effective.
#### Recommended PPO Hyperparameters:
-**Ent_coef:** 0 and 0.01 to aid exploration.
-**Entropy Coefficient (ent_coef):** 0 and 0.01 to aid exploration.
-**Batch size:** 256
-**Number of environments (n_envs):** 32
-**Learning rate:** 0.0006
...
...
@@ -115,7 +115,7 @@ The number of timesteps varies significantly based on the task's complexity (e.g