At the heart of a successful reinforcement learning algorithm sits a well-coded reward function. Reward functions for most real-world tasks are difficult to specify procedurally. Most real-world tasks have complex reward functions. In particular, tasks involving human interaction depend on complex and user-dependent preferences. A popular belief within the RL community is that it is usually easier and more robust to specify a reward function, rather than a policy maximising that reward function.