A quick peek at how Pandas works under the hood

I have been working with Python — primarily Pandas — for a good few years now and I love it; it is simple enough to use and there are ample ways to conduct immensely complex data modelling for any requirements.

However, it may be a bit primitive of me to claim to know Python if all I have ever done was mostly read_csv, filtering, and groupby. Hence, in order to not look like an inept around more seasoned Python programmers I decided to dig a little bit at the Pandas source code to find out how things work under the hood, like get down and dirty into the nitty gritty kind, instead of being a blind user of its various applications. I wanted to walk away from this exercise with several goals: (1) having some practical experience with OOP in Python, (2) learn how packages work in Python, and (3) understanding how Pandas goes about throwing its funky functions around.

In this post, I will be detailing some of my findings, as well as going through a simple code-along to internalise the learnings, where I attempt to recreate a (waaaaay) simpler version of one of pandas’ core components: its DataFrame Class. If you are kind of new to Python, or just a curious reader then this is the post for you!

With that aside, lets break this post down into several sections:

  • Pre-requisites
  • Setting up our Pandas-replica workspace
  • Creating our DataFrame object
  • Reading data into our DataFrame
  • Implementing some DataFrame methods
  • Using our DataFrame in a notebook environment

Link to pandas_replica source code

#python

Replicating Python’s Pandas Library
1.10 GEEK