Reinforcement Learning with SpaceX Rockets! The Falcon 9, developed by aerospace company SpaceX, means it is now possible to reuse the first-stage of the rocket.
The Falcon 9, developed by aerospace company SpaceX, means it is now possible to reuse the first-stage of the rocket, by flying it safely back to Earth.
An achievement once seemed so impossible that it lead creation of multiple “fake SpaceX landing videos-explanations” now is widely agreed upon about how amazing behind the tech related to it.
While today I’m not here nor capable of you giving rocket engineering course just wanted to show this quick little diagram from SpaceX to understand a little bit more.
[You can check more about in video form here]
While none of AI techniques have been deployed to any of SpaceX tech pipeline (they are using classic-robotics/control theory way of path planning algorithms) it would be nice to see what would happen if we tried to solve the problem with_ state of the art_ Reinforcement Learningalgorithms.
If you are new to RL I urge you to checkout other tutorials,books and resources for beginners relating to learning more about fundamentals of RL and the math about it.
Dear Sven Niederberger(@EmbersArc on GitHub) created this tough Reinforcement Learning environment some time ago. While in creation it’s main purpose was to create a nice attractive/hype GYM like environment using ideas from LunarLander envs , it quickly has been realized it’s way harder to solve in terms of RL than what was inspired for it.
How does LunarLander works,ref mc.ai
LunarLander has 2 versions. One designed for discrete action space
1– Fire left engine
2– Fire down engine
3– Fire right engine
other for continuous action space
Action is two real values vector from -1 to +1.
While discrete action one easily can be solved with Value based methods plus non linear functional approximation such as DQN,Rainbow DQN or Gorilla DQN , continuous action problem requires somekind of actor-critic based algorithm to work due sparse rewards, hard exploration and unstable nature of vanilla DQN methods.
PPO+LSTM + paralel train(on policy,cluster style training), Soft Actor Critic(SAC,off policy), DDPG and D4PG(off policy- hybrid actor critic methods) can be used for solving this problem. Especially if you are going for a on policy methods I would urge to you take a look on parallelization and LSTMs(or Attention mechanisms for being SOTA?) with RL in general but that’s a topic for another day.
Our Rocket gym also uses most of things from LunarLander setup and it’s highly customizable structure.It uses Box2D as physics backend and openGL for light rendering of environment(images aren’t used in observations just there for checking your progress)
Inexture's Deep learning Development Services helps companies to develop Data driven products and solutions. Hire our deep learning developers today to build application that learn and adapt with time.
Deep Q-Networks have revolutionized the field of Deep Reinforcement Learning, but the technical prerequisites for easy experimentation have barred newcomers until now.
The Association of Data Scientists is holding a full-day workshop on building games using reinforcement learning on Saturday, February 20.
This paper presents a deep reinforcement learning model that learns control policies directly from high-dimensional sensory inputs.
Designing user experiences is a difficult art. Compared to other applications, video games provide designers a huge canvas to work with.