How Does Data Storage Matter in Data Science?

Data Storing in a data science process refers to storing of useful data which you may use in your data science process to dig the actionable insights out of it. Data Storing in data science itself is an orderly process which needs many things to be kept in consideration before jumping to more advanced or fancy things.

In data science there are many pieces to consider. One consideration is storage. Why does storage matter in data science? In data science, analytics are used to find information from observation, measurements, and other key indicators that form data. To do this, there needs to be a dataset or reservoir of data to manipulate. This needs to be in an accessible and reliable place for use in analysis. Factors for picking data storage:

  1. Quantity, amount of data or size.
  2. Type, what is the data binary, character, numeric, etc.
  3. Users, who is accessing the data.

Places with Space


Local Storage is when data is placed on the machine or drive that the user is using. This is a physical location and is constrained by set limits based on the amount of space allocated. Local data can be fast access. The data needs backed up periodically to ensure a copy is preserved due to the information in a physical location that is not accessible outside of the machine.

Everyday examples are: data stored in a web browser like chrome for one page is 2.5 MB and data for an average one hour video in HD is 3GB.

For Big Data: the human genome for one individual is 725MB, one dataset for weather forecasting is 2GB.


Servers that are used to store data are called Storage Servers. Storage servers are physical locations that have a constraint of physical space available to hold data. This is similar to local storage. The difference is that these locations operate on a network and multiple Storage Servers combine to allow the total space available and any user with access to the server can use the data.

