From Excel to MySQL via Python, and then back to Excel
It may be useful sometimes to convert an Excel sheet into a MySQL database. This conversion will enable querying operations to be undertaken using straightforward SQL queries. Using Python, we can simply convert a limited Excel spreadsheet into a MySQL database.
To demonstrate how this is possible, I will use a Sports example. Having witnessed Liverpool’s triumphant display against Barcelona in the UEFA Champions League (07/05/19), I will choose an example close to home, the Premier League Table.
I first download the Premier League Table and populate it into an Excel file as shown below. Crucially, I then save this Excel file as a ‘comma separated values file (csv)’.
To begin, I first use the open function and the read() method to read this ‘Premier_league.csv’ file as a string, fString. I then create an empty list, and split this string at every new line character. I then split the line further at each specified comma in the string, and append to my fList. To validate, I print out my fList. Here I can see I have a nested list, with the first element of fList representing the columns, the second element representing the team occupying first place and so forth.
f = open('Premier_league.csv')
fString = f.read()
fList = []
for line in fString.split('\n'):
fList.append(line.split(','))
print(fList)
Output:
[['P', 'Team', 'GP', 'W', 'D', 'L', 'F', 'A', 'GD', 'Pts'], ['1', 'Man City', '37', '31', '2', '4', '91', '22', '69', '95']…….
Following this, I need to make a connection to my database management system (DBMS). To achieve this, I install mysql-connector using the pip command, and import this module into the script I am working on. To confirm a connection to MySQL, I print out the MySQLConnection object, which returns an object in memory of my connection, confirming that the connection has been made.
I now create a cursor, and use the cursor’s execute method() to create my database, ‘football_db’. When I switch over to my MySQL Workbench, and refresh the schemas icon, I can see that my database now appears!
import mysql.connector
mydb = mysql.connector.connect(
host='localhost',
user='root',
password='******',
#database='football_db'
)
print(mydb)
cursor = mydb.cursor()
cursor.execute("CREATE DATABASE football_db")
cursor.execute('SHOW DATABASES')
Output:
<mysql.connector.connection.MySQLConnection object at 0x030D98F0>
With the database configuration confirmed, I can now start creating my Table. To begin, I will first need to create the columns for the Table. I can use sub-string notation to assign each of my column names. This will help towards readability too. For example, the ‘Played’ column corresponds to element number 0 in the nested fList mentioned earlier, and within this sub-list, it corresponds to the first element again, [0].
I then create a table using the ‘CREATE TABLE’ command and name it Football as this seems appropriate. As this Table column entry statement is spread over several lines I enclose all in triple strings. For each column, I use replacement field syntax using the .format() method, and assign an appropriate datatype for each of my columns. For example, the ‘Played’ column with have an ‘int’ datatype with a character limit of 2, on the basis that the maximum number of games played by each team over the season is 38. Finally, I then execute my Table generation, and switch across to MySQL Workbench to confirm.
Creating the columns
Played = fList[0][0]
Team = fList[0][1]
GP = fList[0][2]
W = fList[0][3]
D = fList[0][4]
L = fList[0][5]
F = fList[0][6]
A = fList[0][7]
GD = fList[0][8]
Pts = fList[0][9]
queryCreateTable = """CREATE TABLE FOOTBALL(
{} int(2) not null,
{} varchar(255) not null,
{} int(2) not null,
{} int(2) not null,
{} int(2) not null,
{} int(2) not null,
{} int(2) not null,
{} int(2) not null,
{} int(3) not null,
{} int(3) not null
)""".format(Played, Team, GP, W, D, L, F, A, GD, Pts)
cursor.execute(queryCreateTable)
With the columns in the Football Table established, the next objective is to insert the rows of each Team into the Table. However, care must be taken here. It is necessary to delete the first row, fList[0], as it makes no sense to insert the columns into the Table again.
To insert the rows with all of the Team’s data, I create an empty string called ‘rows’. I then use a for loop to iterate over the fList and replace the square brackets with parenthesis. Another if statement is used here to add a comma between all row entries in the String.
To proceed, I use the ‘INSERT INTO’ command with ‘VALUES’ and concatenate my rows to this statement, which I then pass to my cursor’s execute function, which I then commit to my database. Next, I need to confirm everything is working well, by once more switching across to my Workbench and viewing my Table.
del fList[0]
rows = ''
for i in range(len(fList)-1):
rows += "('{}','{}','{}','{}','{}','{}','{}','{}','{}','{}')"\
.format(fList[i][0], fList[i][1], fList[i][2], fList[i][3], fList[i][4],
fList[i][5], fList[i][6], fList[i][7], fList[i][8], fList[i][9])
if i != len(fList)-2:
rows += ','
print(rows)
queryInsert = "INSERT INTO FOOTBALL VALUES" + rows
cursor.execute(queryInsert)
mydb.commit()
Perfect! Now we can start performing queries on the Football Table. To begin, let’s find out who has won less than 15 games this season, but amassed more than 45 points. We simply define our conditional statement in a WHERE clause and the results appear.
We could also use some SQL aggregate functions to dive into the data. For example, lets confirm things are working by checking the number of teams in the League. In addition, lets determine the fewest games won, the highest goals scored and the average number of draws for each team in the Premier League.
To end, it would be nice if we could export this data to Excel in the format displayed above. Whilst this is easy using the export tab shown above, as a bonus, I’ll demonstrate how this can be simply written in Python.
I first decide my filename ‘my_football_stats’ and save this as a CSV file. I then open this CSV file in ‘write’ mode and create some headers which I write to my file. Following this I execute my SQL query, and capture the output in cursor.fetchall() which I assign to the variable result. The **type **of result is alist. This means that I can extract the element from this list by indexing the result list. Row[0] corresponds to the Number_of_teams, row[1], the fewest games won and so forth.
When I run this script, I can check my directory for the file, ‘myfootballstats’ and read the Excel file as shown below.
filename = 'my_football_stats.csv'
f1 = open(filename, 'w')
headers = 'Number_of_Teams, Fewest_games_won, Highest_goals, Average_drawn_games \n'
f1.write(headers)
cursor.execute(''' SELECT COUNT(Team) AS Number_of_Teams,
min(W) AS Fewest_games_won,
max(F) AS Highest_goals,
avg(D) AS Average_drawn_games
FROM football;''')
result = cursor.fetchall()
print(result)
for row in result:
f1.write(str(row[0]) + ',' + str(row[1]) + ',' + str(row[2]) + ',' + str(row[3]) + '\n')
f1.close()
This tutorial has shown how we can go full-circle. Firstly, by starting with an Excel file, we can transform this into a Database table amenable to querying in a database management system. We can then output whatever queries we wish to run back to Excel to complete our transformation. Whilst this is possible using Python’s Pandas library, this example could easily be built upon by creating relational tables. For example, we could use the Team’s names as a unique ID and begin joining to other Tables specific to that team. In this way, we can create powerful relational databases.
#python #mysql