Project topics for DRL on video games summer school 2019
Following is a list of four possible topics for project work on second week.
For all challenges, report/return following (One zip file with everything in it. Report in PDF format):
- All related code. You do not have to include trained models or Python libraries.
- Description of the learning algorithm, along with the hyperparameters and network sizes.
- Description of the environment (observation space, action space, reward signal, task)
- Agent's performance in the environment (e.g. a plot that shows how agent improves over time, performance of the final agent)
- Conclusion and how results could be improved / what you would try next.
Return your project submissions via email to with title beginning with `[Summer School]`
Deadline is same as for learning diary (30.8.2019).
1) Imitation learning in Atari
Apply imitation learning (behavioral cloning) on Atari games:
- Write code to record human gameplay in Atari games (see examples from Monday practicals).
- Train deep learning models to map images to actions according to how humans played.
- Evaluate how well agent does in Atari games (subjective and objective evaluation).
2) DQN in ViZDoom
Apply Deep Q Learning to ViZDoom environments:
- Write DQN learning code for ViZDoom (you can use Wednesday’s practicals code as template)
- Write the training loop for ViZDoom. Note that ViZDoom by default does not offer Gym API, so you
do not have single convenient “step()” function.
- Evaluate the algorithm in couple of environments. You can find scenarios from the following link (you need both .cfg and .wad files for scenario).
simpler_basic.cfg
is a good starting point to debug your implementation. Try also health_gathering.cfg
and if you have time,
try defend_the_center.cfg
.
Notes:
- Track performance by measuring the episodic reward (sum of rewards from one game)
- Neural networks do not like values with large magnitudes. If episodic rewards are too large (e.g. above 50), try rescaling
rewards to be smaller.
3) Self-play in Toribash
In competitive games, one intuitive way to train better agents is to let it fight against itself.
Try this in Toribash with Torille environment:
- Create a learning agent (e.g. A2C) in Toribash mod
aikidobigdojo.tbm
.
- Train the agent to win the game (i.e. +1 reward if it wins, -1 reward if it loses). The opponent is same agent, but agent only learns
from experiences from first player
- See if agent can learn to beat a random agent, even if the agent never played against a random agent. Random agent = agent that picks random actions.
Notes:
- Since self-play requires tinkering with hyperparameters, it is recommended to use existing implementations of learning algorithms, e.g.
stable-baselines (https://github.com/hill-a/stable-baselines) or RLLib (https://ray.readthedocs.io/en/latest/rllib.html)
- Learning this way can take millions of samples. You do not have to learn good agents, as long you can conclude the learning agent did
learn to play against itself.
- Any ideas how this could be done better? Hint: There has been research on this.
4) Joker card: Pick environment, pick algorithm and train!
Find an environment where you want to train an agent, find a good implementation of learning algorithm,
and see what happens when you combine these two!
Note: Select a learning algorithm + environment combination that does not have existing set of hyperparameters available. E.g. stable-baselines has "Model Zoo" that has bunch of pre-trained models along with their hyperparameters. Pick something new!
While at first glance this may seem trivial, succesfully training agents may take environment-specific tuning of hyper-parameters. Good starting point is to look at hyperparameters with same learning algorithm in similar environments.
One listing of environments: RLEnv.directory
Some reinforcement learning libraries (there are a bunch of them but only few are of high quality): stable-baselines, RLLib, pytorch-rl.