Using Machine Learning to Find Exoplanets with NASA's Dataset

Using Machine Learning to Find Exoplanets with NASA's Dataset

In this post, we'll learn how to build an algorithm to find planets out of the solar system.

Learn how to build an algorithm to find planets out of the solar system

A few weeks ago, I wrote an article about using data science in meaningful ways that could help our world become a better place. Now, let’s talk a little bit about other worlds. We can train machines to identify candidates for exoplanets with real datasets provided by NASA and Caltech. How cool is that? Thus, I decided to go on an adventure through the mysteries of the universe. My idea is to create a machine learning model that can predict if an observation is a real candidate for an exoplanet or not. The data was collected by the Kepler mission that revealed thousands of planets out of our Solar System. Unfortunately, the Kepler mission ended in 2018. However, it gave us thousands of observations, so we can train our machines to find planets as well.

And how did the Kepler telescope find planets so far from us if no one can take a clear picture of Pluto from Earth? Well, Kepler was able to find planets by looking for small dips in the brightness of a star when a planet transits in front of it. It is possible to measure the size of the planet based on the depth of the transit and the star’s size.

For this article, I downloaded the most recent dataset from the Caltech website. However, if you feel adventurous, you can use NASA’s API  and do some web scraping out of the fountain. I know I want to explore that soon, but for now, let’s keep things a little easier and use NASA’s and Caltech dataset. You can find a similar dataset on Kaggle , the problem is that the dataset was uploaded three years ago and it’s not up to date. The best prediction on Kaggle got a 95% accuracy . To make things more straightforward, I will skip a few exploratory data analyses, but I shared the notebook’s complete version on my Github . You should be familiar with Python and its main packages prior to running the following code. However, if you are not familiar with Python and its main packages, you should be able to reproduce the same results running the notebook in full.

python machine-learning pandas algorithms data-science

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

6 Best Python IDEs for Data Science & Machine Learning [2021]

6 Best Python IDEs for Data Science & Machine Learning [2021] - An IDE (Integrated Development Environment) is used for software development. An IDE may have a compiler, debugger, and all the other requirements needed for software development. IDEs help in consolidating different aspects of a computer program

6 Best Python IDEs for Data Science & Machine Learning [2021] | upGrad blog

Introduction  An IDE (Integrated Development Environment) is used for software development. An IDE may have a compiler, debugger, and all the other requirements needed for software development. IDEs help in consolidating different aspects of a computer program. 6 Best Python IDEs for Data Science & Machine Learning [2021]

Machine Learning Algorithms | Data Science | Machine Learning | Python

Machine learning algorithms are different from other algorithms. With most algorithms, a programmer begins by entering the algorithm.

How I'd Learn Data Science If I Were To Start All Over Again

A couple of days ago I started thinking if I had to start learning machine learning and data science all over again where would I start?