Build your Data from scratch in Python

The first and perhaps most important step of any data analytics work is to acquire your raw ingredients; your data.

Depending on the maturity of your project, this stage can be very straightforward — requesting a csv from a colleague, querying a well-structured database, etc. — or significantly more involved, like a building a custom web scraper.

But not all the data you need for your projects will come from external sources. Sometimes you’ll need to cook it up yourself.

In this post, I’ll walk through the process of creating a DataFrame from scratch.

Understanding the “DataFrame” method

The easiest way to create a new DataFrame is to use the “DataFrame” method. If you are familiar with object oriented programming, you’ll notice that this is actually a constructor call, which instantiates a new DataFrame object.

All arguments are optional, which means you can create an empty DataFrame by passing in… nothing:

import pandas as pd
empty_df = pd.DataFrame()

This can be helpful if you want an empty DataFrame to populate later with data. For example, you could store summary results for several machine learning models by starting with an empty DataFrame and then writing a loop to populate rows for each model.

In most cases, however, you’ll want to fill your DataFrame with data from the start. Luckily, Pandas is very flexible, allowing programmers to convert a variety of data structures into DataFrames.

#data-science #python #pandas #developer

How to Create a DataFrame from Scratch in Python
1.85 GEEK