<strong>Originally published by </strong><a href="https://medium.com/@aThinkingBusinessAnalyst" target="_blank">Costas Andreou</a> <em>at </em><a href="https://medium.com/financeexplained/from-excel-to-databases-with-python-c6f70bdc509b" target="_blank">medium.com</a>
Learn how to use Python for quick data analysis
As a business analyst, not a day goes by that I do not find myself fiddling with some data in an excel spreadsheet. Talk to anyone working in an analytical role, and they will tell you about their love-hate relationship with excel. For all the good things that excel can do; it’s simply a pain to work with when it comes to larger data sets.
Pivots take for ever to load, the machine runs out of memory, and before you know it the whole thing becomes unmanageable. Not to mention that excel can only support up to 1,048,576 rows. Sure, you could consider doing things in VBA, but what’s the point.
If only there were an easy way to transfer data into a SQL Database, do your analysis and then delete it all. Well, this is where Python swoops in to save the day.
To begin with, let us explore the most popular options when it comes to SQL in Python. The two most popular SQL DBs to work with in Python is MySQL and SQLite.
MySQL has two popular libraries associated with it: PyMySQL and MySQLDb; while SQLite has SQLite3.
SQLite is what is known as an embedded database, which means it runs within our application and hence it is not required to be installed somewhere first (unlike MySQL).
This is an important difference; and pivotal in our quest for quick data analysis. As such, we will go ahead and learn how to use SQLite.
The first thing we need to do is import the library:
Then we need to determine whether we would like to save this database anywhere or simply hold it in memory while our application in running.
If decided to actually save the database down with any of the data imported, we would then have to give the DB a name, say ‘FinanceExplainedDb’, and have the following command:
dbname = 'FinanceExplainedDb' conn = sqlite3.connect(dbname + '.sqlite')
On the other hand, if we wanted the whole thing in memory, and for it to vanish when we were done, we could use the following command:
conn = sqlite3.connect(':memory:')
At this point, SQLite is all set up and ready to be used in Python. Assuming we had some data loaded in the DB under Table1, we could execute SQL commands in the following way:
cur = conn.cursor() cur.execute('SELECT * FROM Table1') for row in cur: print(row)
Let us now explore how we can make our data available through our application using Pandas.
NB: SQLite has a very cool browser should you decide you want to run your SQL commands manually and on the fly after you’ve created your db. You can find it here.
Assuming that we already have the data, we would like to carry analysis on, we can use the Python Pandas library to do it.
First, we need to import the Pandas library and then we can load the data in a data frame (You can think of data frames as an array of sorts):
import pandas as pd #if we have a csv file df = pd.read_csv('ourfile.csv') #if we have an excel file df = pd.read_excel('ourfile.xlsx')
Once we have loaded the data, we can put it straight into our SQL Database with a simple command:
If you are loading multiple files within the same table, you can use the *if_exists *parameter:
df.to_sql(name='Table1', con=conn, if_exists='append')
When it comes to dealing with larger data sets, we will not be able to use this one-line commands to load the data. Our application will run out of memory. Instead, we will have to load our data little by little.
For this example, let’s assume we will load 10,000 rows at a time:
chunksize = 10000 for chunk in pd.read_csv('ourfile.csv', chunksize=chunksize): chunk.to_sql(name='Table1', con=conn, if_exists='append')
To bring everything together, I have decided to give you a Python script that covers most of the things we talked about.
The script will do the following things:
import sqlite3, pandas as pd, numpy as np #####Creating test data for us -- you can ignore from sklearn import datasets iris = datasets.load_iris() df1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']], columns= iris['feature_names'] + ['target']) df1.to_csv('TestData.csv',index=False) ########################### conn = sqlite3.connect(':memory:') cur = conn.cursor() chunksize = 10 for chunk in pd.read_csv('TestData.csv', chunksize=chunksize): chunk.columns = chunk.columns.str.replace(' ', '_') #replacing spaces with underscores for column names chunk.to_sql(name='Table1', con=conn, if_exists='append') cur.execute('SELECT * FROM Table1') names = list(map(lambda x: x, cur.description)) #Returns the column names print(names) for row in cur: print(row) cur.close()
There you have it guys. A brief introduction on how you can import your data into a database for quick analysis.
Do you have any other tips or tricks?
In the programming world, Data types play an important role. Each Variable is stored in different data types and responsible for various functions. Python had two different objects, and They are mutable and immutable objects.
Magic Methods are the special methods which gives us the ability to access built in syntactical features such as ‘<’, ‘>’, ‘==’, ‘+’ etc.. You must have worked with such methods without knowing them to be as magic methods. Magic methods can be identified with their names which start with __ and ends with __ like __init__, __call__, __str__ etc. These methods are also called Dunder Methods, because of their name starting and ending with Double Underscore (Dunder).
Python is an interpreted, high-level, powerful general-purpose programming language. You may ask, Python’s a snake right? and Why is this programming language named after it?
Are you looking for experienced, reliable, and qualified Python developers? If yes, you have reached the right place. At **[HourlyDeveloper.io](https://hourlydeveloper.io/ "HourlyDeveloper.io")**, our full-stack Python development services...
Python any() function returns True if any element of an iterable is True otherwise any() function returns False. The syntax is any().