Project topics for DRL on video games summer school 2019

Following is a list of four possible topics for project work on second week.

For all challenges, report/return following (One zip file with everything in it. Report in PDF format):

All related code. You do not have to include trained models or Python libraries.
Description of the learning algorithm, along with the hyperparameters and network sizes.
Description of the environment (observation space, action space, reward signal, task)
Agent's performance in the environment (e.g. a plot that shows how agent improves over time, performance of the final agent)
Conclusion and how results could be improved / what you would try next.

Return your project submissions via email to with title beginning with `[Summer School]`

Deadline is same as for learning diary (30.8.2019).

1) Imitation learning in Atari

Apply imitation learning (behavioral cloning) on Atari games:

Write code to record human gameplay in Atari games (see examples from Monday practicals).
Train deep learning models to map images to actions according to how humans played.
Evaluate how well agent does in Atari games (subjective and objective evaluation).

2) DQN in ViZDoom

Apply Deep Q Learning to ViZDoom environments:

Write DQN learning code for ViZDoom (you can use Wednesday’s practicals code as template)
Write the training loop for ViZDoom. Note that ViZDoom by default does not offer Gym API, so you do not have single convenient “step()” function.
Evaluate the algorithm in couple of environments. You can find scenarios from the following link (you need both .cfg and .wad files for scenario). simpler_basic.cfg is a good starting point to debug your implementation. Try also health_gathering.cfg and if you have time, try defend_the_center.cfg.

Notes:

Track performance by measuring the episodic reward (sum of rewards from one game)
Neural networks do not like values with large magnitudes. If episodic rewards are too large (e.g. above 50), try rescaling rewards to be smaller.

3) Self-play in Toribash

In competitive games, one intuitive way to train better agents is to let it fight against itself. Try this in Toribash with Torille environment:

Create a learning agent (e.g. A2C) in Toribash mod aikidobigdojo.tbm.
Train the agent to win the game (i.e. +1 reward if it wins, -1 reward if it loses). The opponent is same agent, but agent only learns from experiences from first player
See if agent can learn to beat a random agent, even if the agent never played against a random agent. Random agent = agent that picks random actions.

Notes:

Since self-play requires tinkering with hyperparameters, it is recommended to use existing implementations of learning algorithms, e.g. stable-baselines (https://github.com/hill-a/stable-baselines) or RLLib (https://ray.readthedocs.io/en/latest/rllib.html)
Learning this way can take millions of samples. You do not have to learn good agents, as long you can conclude the learning agent did learn to play against itself.
Any ideas how this could be done better? Hint: There has been research on this.

4) Joker card: Pick environment, pick algorithm and train!

Find an environment where you want to train an agent, find a good implementation of learning algorithm, and see what happens when you combine these two!

Note: Select a learning algorithm + environment combination that does not have existing set of hyperparameters available. E.g. stable-baselines has "Model Zoo" that has bunch of pre-trained models along with their hyperparameters. Pick something new!

While at first glance this may seem trivial, succesfully training agents may take environment-specific tuning of hyper-parameters. Good starting point is to look at hyperparameters with same learning algorithm in similar environments.

One listing of environments: RLEnv.directory

Some reinforcement learning libraries (there are a bunch of them but only few are of high quality): stable-baselines, RLLib, pytorch-rl.