From Excel To Databases with Python

From Excel To Databases with Python

<strong>Originally published by&nbsp;</strong><a href="https://medium.com/@aThinkingBusinessAnalyst" target="_blank">Costas Andreou</a> <em>at&nbsp;</em><a href="https://medium.com/financeexplained/from-excel-to-databases-with-python-c6f70bdc509b" target="_blank">medium.com</a>

Learn how to use Python for quick data analysis

As a business analyst, not a day goes by that I do not find myself fiddling with some data in an excel spreadsheet. Talk to anyone working in an analytical role, and they will tell you about their love-hate relationship with excel. For all the good things that excel can do; it’s simply a pain to work with when it comes to larger data sets.

Pivots take for ever to load, the machine runs out of memory, and before you know it the whole thing becomes unmanageable. Not to mention that excel can only support up to 1,048,576 rows. Sure, you could consider doing things in VBA, but what’s the point.

If only there were an easy way to transfer data into a SQL Database, do your analysis and then delete it all. Well, this is where Python swoops in to save the day.

SQL In Python

To begin with, let us explore the most popular options when it comes to SQL in Python. The two most popular SQL DBs to work with in Python is MySQL and SQLite.

MySQL has two popular libraries associated with it: PyMySQL and MySQLDb; while SQLite has SQLite3.

SQLite is what is known as an embedded database, which means it runs within our application and hence it is not required to be installed somewhere first (unlike MySQL).

This is an important difference; and pivotal in our quest for quick data analysis. As such, we will go ahead and learn how to use SQLite.

Setting up SQLite in Python

The first thing we need to do is import the library:

import sqlite3

Then we need to determine whether we would like to save this database anywhere or simply hold it in memory while our application in running.

If decided to actually save the database down with any of the data imported, we would then have to give the DB a name, say ‘FinanceExplainedDb’, and have the following command:

dbname = 'FinanceExplainedDb'
conn = sqlite3.connect(dbname + '.sqlite')

On the other hand, if we wanted the whole thing in memory, and for it to vanish when we were done, we could use the following command:

conn = sqlite3.connect(':memory:')

At this point, SQLite is all set up and ready to be used in Python. Assuming we had some data loaded in the DB under Table1, we could execute SQL commands in the following way:

cur = conn.cursor()
cur.execute('SELECT * FROM Table1')
for row in cur:
    print(row)

Let us now explore how we can make our data available through our application using Pandas.

NB: SQLite has a very cool browser should you decide you want to run your SQL commands manually and on the fly after you’ve created your db. You can find it here.

Using Pandas to load data in our application

Assuming that we already have the data, we would like to carry analysis on, we can use the Python Pandas library to do it.

First, we need to import the Pandas library and then we can load the data in a data frame (You can think of data frames as an array of sorts):

import pandas as pd
#if we have a csv file
df = pd.read_csv('ourfile.csv')
#if we have an excel file
df = pd.read_excel('ourfile.xlsx')

Once we have loaded the data, we can put it straight into our SQL Database with a simple command:

df.to_sql(name='Table1', con=conn)

If you are loading multiple files within the same table, you can use the *if_exists *parameter:

df.to_sql(name='Table1', con=conn, if_exists='append')

Memory Considerations

When it comes to dealing with larger data sets, we will not be able to use this one-line commands to load the data. Our application will run out of memory. Instead, we will have to load our data little by little.

For this example, let’s assume we will load 10,000 rows at a time:

chunksize = 10000
for chunk in pd.read_csv('ourfile.csv', chunksize=chunksize):
    chunk.to_sql(name='Table1', con=conn, if_exists='append')

Bringing it all together

To bring everything together, I have decided to give you a Python script that covers most of the things we talked about.

The script will do the following things:

  • Load some sample data from a Python library
  • Write the data out to a CSV
  • Load the data back into our application through the CSV in a data frame chunk by chunk and put in a DB
  • Then execute a *SELECT *statement on the database
import sqlite3, pandas as pd, numpy as np
#####Creating test data for us -- you can ignore
from sklearn import datasets
iris = datasets.load_iris()
df1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target'])
df1.to_csv('TestData.csv',index=False)
###########################
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
chunksize = 10
for chunk in pd.read_csv('TestData.csv', chunksize=chunksize):
    chunk.columns = chunk.columns.str.replace(' ', '_') #replacing spaces with underscores for column names
    chunk.to_sql(name='Table1', con=conn, if_exists='append')
cur.execute('SELECT * FROM Table1')
names = list(map(lambda x: x[0], cur.description)) #Returns the column names
print(names)
for row in cur:
    print(row)
cur.close()

There you have it guys. A brief introduction on how you can import your data into a database for quick analysis.

Do you have any other tips or tricks?

python

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Basic Data Types in Python | Python Web Development For Beginners

In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.

How To Compare Tesla and Ford Company By Using Magic Methods in Python

Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).

Python Programming: A Beginner’s Guide

Python is an interpreted, high-level, powerful general-purpose programming language. You may ask, Python’s a snake right? and Why is this programming language named after it?

Hire Python Developers

Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, our full-stack Python development services...

Python any: How to Check If Element is Iterable or Not

Python any() function returns True if any element of an iterable is True otherwise any() function returns False. The syntax is any().