@@ -4,7 +4,8 @@ The reinforcement learning folder has four key python files.
2. train_single_agent.py: This trains a single agent on the pre-defined configs (those are managed with hydra). It also enables multirun or a hyperparameter sweeper, if defined in the rl_config.yaml
3. run_single_agent.py: This enables loading a trained agent and let him play in the environment.
4. play_gym.py: This enables playing a character in the gym yourself. This can be helpful to look at the representation or try out different hooks and their rewards.
There is also a subfolder called: obs_converter where several converters for the conversion of environment to vector representation are defined and can be used in the gym_env
There is also a subfolder called: obs_converter where several converters for the conversion of environment to vector representation are defined and can be used in the gym_env. When developing new converters you must pay attention to flatten them properly, as only flattened arrays are properly processed by PPO. Additionally, using the CNNPolicy is only supported for images and not for multi dimensional vectors.
# Overcooked-AI and Cooperative Cuisine
...
...
@@ -21,9 +22,12 @@ Therefore, the parameters used for overcooked-AI are simply used in the dedicate
The layout format is different, which is why a mapping is defined which converts the overcooked-AI layout into the cooperative cuisine layout.
The layout file has to be present in cooperative_cuisine/reinforcement_learning/layouts/overcooked_ai_layouts.
## Results on the overcooked-AI layouts
As the overcooked-AI project does not include a cutting board and also does not include random environments we were able to replicate the results of overcooked-AI in our environment. We were also able to achieve good performance on several overcooked-AI layouts with random counter placement.
# Experiences with Reinforcement Learning on the cooperative cuisine environment
## Introducing intermediate rewards
The introduction of intermediate rewards is the most important step, as there are 6 possible actions per iteration and a meal might need up to 20 moves in the correct order, which makes the probability for the agent to find this by chance very small. Therefore, small intermediate rewards should be given to stimulate good actions. In comparison to the final rewards, these should however only be small.
The introduction of intermediate rewards is the most important step, as there are 6 possible actions per iteration and a meal might need up to 20 moves in the correct order, which makes the probability for the agent to find this by chance very small. Therefore, small intermediate rewards should be given to stimulate good actions. In comparison to the final rewards, these should however only be small. Especially the usage of the trashcan is difficult to manage as a high negative reward on the trashcan usage might lead to the agent not interacting at all and therefore not learning anything. Not punishing trashcan usage may lead to the agent always cutting and throwing away.