Recently, DeepMind collaborated with Columbia University to propose Taylor expansion Policy Optimisation (TayPO), which is a policy optimisation formalism that generalises methods like trust region policy optimisation (TRPO) and improves the performance of several state-of-the-art distributed algorithms.
#deepmind #algorithm #reinforcementlearning