Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below:

Reinforcement learning has gotten a lot of attention recently, thanks in large part to systems like AlphaGo and AlphaZero, which have highlighted its immense potential in dramatic ways. And while the RL systems we’ve developed have accomplished some impressive feats, they’ve done so in a fairly naive way. Specifically, they haven’t tended to confront multi-agent problems, which require collaboration and competition. But even when multi-agent problems have been tackled, they’ve been addressed using agents that just assume other agents are an uncontrollable part of the environment, rather than entities with rich internal structures that can be reasoned and communicated with.That’s all finally changing, with new research into the field of multi-agent RL, led in part by OpenAI, Oxford and Google alum, and current FAIR research scientist Jakob Foerster. Jakob’s research is aimed specifically at understanding how reinforcement learning agents can learn to collaborate better and navigate complex environments that include other agents, whose behavior they try to model. In essence, Jakob is working on giving RL agents a theory of mind.Our conversation spanned a mix of fundamental and philosophical topics, but here were some of my favourite take-homes:

  • When I asked Jakob what his fundamental definition of “learning” was, he answered in terms of sample complexity — the number of samples needed in order to train a machine learning model. The true goal of learning, he argues, is to “learn how to learn” — to find the algorithms and strategies that reduce sample complexity fastest. It’s in that sense that the evolutionary process that gave rise to human beings was a worse “learner” than the cognitive process that human beings use to understand the world and make predictions: whereas it takes millions of individuals’ deaths to allow a species’ genome to “learn” something, a human brain can do so with a minuscule number of data points (and sometimes, with none at all).Jakob argues that RL agents can benefit from explicitly recognizing other agents not as parts of the environment over which they can’t have any control, but as, well, agents — complete with a thought process of their own. In a circumstance where all agents are identical, any given agent can construct a fairly accurate model of its fellow agents, and that model can serve as the basis for effective collaboration and coordination among groups of agents.One of the challenges that arises when modeling agents in this way is communication. In order for agents to communicate, they have to develop a common language, but that’s not as easy as it may seem: one agent may develop a way of expressing itself, but if the other doesn’t happen to have developed the same exact method, communication will be fruitless. So one important constraint, Jakob suggests, is that agents need to learn language together — so that, if they decide to try to improve their communication method, they do so together to maintain their ability to understand one another.

#editors-pick #tds-podcast #reinforcement-learning #deep-learning #artificial-intelligence #deep learning

Multi-agent reinforcement learning and the future of AI
2.00 GEEK