A Simple Code to Read 1.2 Billion Rows Just in 1 Second

Nowadays, we are entering many kinds of the era. Some people said that we are in the Disruption Era. To understand it, we can use a term from Schwartz (1999) in his book, Digital Darwinism. The term describes that we are entering the era in which businesses can not adapt to the evolution of technology and science. Digital platforms and globalization change the customers’ paradigm and change their needs.

On the other hand, some people said that we are entering the Big Data Era. Almost all of the disciplines got experienced with data booming. One of them is Astronomy. Astronomers around the world realized that they need to work in a consortium build bigger and bigger telescope or observatory to collect more data. For example, at the beginning of 2000, an all sky survey called 2 Micron All Sky Survey (2MASS), gathered data of about 470 million objects. In the middle of 2016, Gaia, a space-based telescope, released its 2nd data release consisting of about 1.7 billion objects. How the astronomers handle it?

In this article, we will discuss how to deal with big data practically. We will use a galaxy simulation data from Gaia Universe Model Snapshots (GUMS). It has about 3.5 billion objects. You can access it here. For instance, we will just read 1.2 billion rows.

#python #astronomy #big-data #data-science #data-visualization

medium.com

A Simple Code to Read 1.2 Billion Rows Just in 1 Second