Exploratory data analysis is a key skill in any data scientist’s toolset. The ability to tell stories with data is paramount for inspiring decision makers, and sparking interest in a topic, domain or subject. After all, we all love a good story.
To illustrate its power, I’ve conducted an exploratory analysis on 7 decades of space travel data. I want to share some of the fascinating insights that promise to shock and surprise you. There are myriad ways to analyse data for exploration, but ultimately to prevent yourself from getting lost you should always hold in mind the purpose of your exploration.
Throughout this piece I use the word astronaut to refer to anybody who has travelled to space
To inform the reader, I have structured my analysis by pulling out insights from the data and expressing them in the form of “Top 10s”, records and notable achievements.
To provide some context, I’ve generated a time series of space missions. I open with this to paint a picture about how space travel has progressed over seven decades.
Although rockets, satellites, and companies are a large part of space exploration, much of my analysis focusses on the achievements of the astronauts themselves. Some of the most fascinating insights from the data are to do with achievements you might not have thought humanly possible.
It’s important to remember that often people connect better with stories about people; something any data scientist should bear in mind when conducting exploratory data analysis if they wish to tell compelling stories.
Note: All records are relevant at the time of writing this piece.
I have used two data sets for my exploration, both are publicly available. One is at the space mission level, the other at the astronaut level. It’s important to remember that no data should be taken with a capital “D”, any insights from your data should be sense checked. I have spotted a few data quality issues by doing this.
Astronauts: All astronauts who participated in space missions before 15 January 2020. The data sources are NASA, Roscosmos and fun-made websites. The data set was at the mission-astronaut level containing details about each astronaut and details of the specific mission. It consists of 517 unique missions assuming that a single mission can be identified by concatenating the mission year, mission title and ascend shuttle.
Missions: This is mission level data scraped from the web. It includes all space missions since the start in 1957. Aspects of the mission such as cost, rockets, time of launch, location are detailed. The Missions data contains 4,324 missions of which 3,879 are successful.
There is a large discrepancy between the total number of missions in the Astronaut data vs the Missions data. One explanation for this could be because many pursuits captured in the Missions data were unmanned.
I’ve used Python for all of the data exploration. Much of the data wrangling is don with the assistance of the NumPy and Pandas libraries.
For data visualisation, I have used Matplotlib and Seaborn. I will not be sharing code snippets in this piece. However, I will make the full end to end code available in GitHub.
#data-visualization #space #data-analysis
Here are the fascinating details and fascinating stories from the characters in the 7th space fairy tale.