In this tutorial, we will learn about the following topics:
Python provides the Openpyxl module, which is used to deal with Excel files without involving third-party Microsoft application software. By using this module, we can have control over excel without open the application. It is used to perform excel tasks such as read data from excel file, or write data to the excel file, draw some charts, accessing excel sheet, renaming sheet, modification (adding and deleting) in excel sheet, formatting, styling in the sheet, and any other task. Openpyxl is very efficient to perform these tasks for you.
Data scientists often use the Openpyxl to perform different operations such as data copying to data mining as well as data analysis.
The Openpyxl library is used to write or read the data in the excel file and many other tasks. An excel file that we use for operation is called Workbook that contains a minimum of one Sheet and a maximum of tens of sheets.
In the above section, we have discussed the openpyxl briefly and its working process. In order to use Openpyxl, one should have Python 3.7 and openpyxl 2.6.2 installed in the system. Let’s start working with openpyxl by installing openpyxl using the following command:
pip install openpyxl
The xlsx is the extension of the XML spreadsheet file. The xlsx file supports macros. Let’s understand the basic operation related to the excel file. Consider the following code:
from openpyxl import Workbook
import time
wb = Workbook()
sheet = wb.active
sheet['A1'] = 87
sheet['A2'] = "Devansh"
sheet['A3'] = 41.80
sheet['A4'] = 10
now = time.strftime("%x")
sheet['A5'] = now
wb.save("sample_file.xlsx")
Output:
In the above code, we have written data into the five cells A1, A2, A3, A4, and A5. These cells consist of different types of values. We have imported Workbook class from the openpyxl module. A workbook class is a container that contains all parts of the document.
Here we have defined a new workbook. At least one sheet is always made with a workbook.
wb = Workbook()
We get the location of the active sheet.
sheet['A1'] = 87
sheet['A2'] = 'Devansh'
We have saved all data to the sample_file.xlsx file using the save() method.
We can add data to the excel file using the following Python code. First, we will import the load_workbook function from the openpyxl module, then create the object of the file and pass filepath as an argument. Consider the following code:
from openpyxl import load_workbook
wb = load_workbook(r'C:\Users\DEVANSH SHARMA\Desktop\demo.xlsx')
sheet = wb.active
sheet['A1'] = 'Devansh Sharma'
sheet.cell(row=2, column=2).value = 5
wb.save(r'C:\Users\DEVANSH SHARMA\Desktop\demo.xlsx')
Output:
Openpyxl provides an append() method, which is used to append the group of values. We can append any type of value. These values are appended at the bottom of the current working sheet. Consider the following code:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
data = (
(11, 48, 50),
(81, 30, 82),
(20, 51, 72),
(21, 14, 60),
(28, 41, 49),
(74, 65, 53),
("Peter", 'Andrew',45.63)
)
for i in data:
sheet.append(i)
wb.save('appending_values.xlsx')
Output:
We can read the data that we have previously written in the cell. There are two methods to read a cell, firstly we can access it by cell name, and secondly, we can access it by the cell() function. For example, we are reading the data from the sample_file.xlrs file.
import openpyxl
wb = openpyxl.load_workbook('sample_file.xlsx')
sheet = wb.active
x1 = sheet['A1']
x2 = sheet['A2']
#using cell() function
x3 = sheet.cell(row=3, column=1)
print("The first cell value:",x1.value)
print("The second cell value:",x2.value)
print("The third cell value:",x3.value)
Output:
The first cell value: 87
The second cell value: Devansh
The third cell value: 41.8
We can read the values from the multiple cells. In the following example, we have marks.xlsx named excel file and we will read each cell of file using the range operator. Let’s have a look at the following program:
import openpyxl
wb = openpyxl.load_workbook('marks.xlsx')
sheet = wb.active
#
cells = sheet['A1','B7']
# cells behave like range operator
for i1,i2 in cells:
print("{0:8} {1:8}".format(i1.value,i2.value))
Output:
Student_name Marks
Tony Stark 47
Loki 59
Oddin 73
Nick Fury 62
Samaul 75
Peter Parkar 80
The openpyxl provides the iter_row() function, which is used to read data corresponding to rows. Consider the following example:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
rows = (
(90, 46, 48, 44),
(81, 30, 32, 16),
(23, 95, 87,27),
(65, 12, 89, 53),
(42, 81, 40, 44),
(34, 51, 76, 42)
)
for row in rows:
sheet.append(row)
for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=4):
for cell in row:
print(cell.value, end=" ")
print()
book.save('iter_rows.xlsx')
Output:
90 46 48 44
81 30 32 16
23 95 87 27
65 12 89 53
42 81 40 44
34 51 76 42
The openpyxl provides iter_col() method which return cells from the worksheet as columns. Consider the following example:
from openpyxl import Workbook
book = Workbook()
sheet = book.active
rows = (
(90, 46, 48, 44),
(81, 30, 32, 16),
(23, 95, 87, 27),
(65, 12, 89, 53),
(42, 81, 40, 44),
(34, 51, 76, 42)
)
for row in rows:
sheet.append(row)
for row in sheet.iter_cols(min_row=1, min_col=1, max_row=6, max_col=3):
for cell in row:
print(cell.value, end=" ")
print()
book.save('iterbycols.xlsx')
Output:
90 81 23 65 42 34
46 30 95 12 81 51
48 32 87 89 40 76
As we know that each workbook can have multiple sheets. First, we need to create more than one sheet in a single workbook then we can access those excel sheets using Python. In the following example, we have created a workbook with three sheets:
import openpyxl
wb = openpyxl.load_workbook('dimension_1.xlsx')
#Getting list of all sheet available in workbook
print(wb.get_sheet_names())
# Returning object
active_sheet = wb.active
print(type(active_sheet))
# Title of sheet
sheet = wb.get_sheet_by_name("Monday")
print(sheet.title)
Output:
['Sheet', 'Sunday', 'Monday', 'Tuesday', 'Wednesday']
<class 'openpyxl.worksheet.worksheet.Worksheet'>
Monday
It will look like the following image.
The auto_filter attribute is used to set filtering and sorting conditions. Consider the following code:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
sheet['A3'] = 40
sheet['B3'] = 26
row_count = [
(93,45),
(23,54),
(80,43),
(21,12),
(63,29),
(34,15),
(80,68),
(20,41)
]
for row in row_count:
sheet.append(row)
print(sheet.dimensions)
for a1,a2 in sheet[sheet.dimensions]:
print(a1.value, a2.value)
sheet.auto_filter.add_sort_condition('B2:B8')
sheet.auto_filter.add_filter_column(1, ['40', '26'])
wb.save('dimension_1.xlsx')
Output:
A3:B11
40 26
93 45
23 54
80 43
21 12
63 29
34 15
80 68
20 41
We can merge the cell using the merge_cells() method. When we merge the cells, the top-left one is removed from the worksheet. The openpyxl also provides the unmerged_cells() method to unmerge the cell. Consider the following code:
from openpyxl.styles import Alignment
wb = Workbook()
sheet = wb.active
sheet.merge_cells('A1:B2')
cell = sheet.cell(row=1, column=1)
cell.value = 'Devansh Sharma'
cell.alignment = Alignment(horizontal='center', vertical='center')
wb.save('merging.xlsx')
Output:
The freezing panes are simply mean to freeze an area of worksheet as visible while scrolling to other parts of the worksheet. It is useful feature where we can display the top row or leftmost column on the screen. We can do this by passing the cell name to the freeze_panes variable. To unfreeze all panes, set freeze_panes to ‘None’. Consider the following code:
from openpyxl import Workbook
from openpyxl.styles import Alignment
wb = Workbook()
sheet = wb.active
sheet.freeze_panes = 'A1'
wb.save('freez_cells.xlsx')
Output:
Run the above code and scroll the worksheet.
We can write formula into the cell. These formulas are used to perform operations in excel file. After writing in the cell execute it from the workbook. Consider the following example:
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
rows_count = (
(14, 27),
(22, 30),
(42, 92),
(51, 32),
(16, 60),
(63, 13)
)
for i in rows_count:
sheet.append(i)
cell = sheet.cell(row=7, column=3)
cell.value = "=SUM(A1:B6)"
cell.font = cell.font.copy(bold=True)
wb.save('formulas_book.xlsx')
Output:
The openpyxl cell inverter is used to invert the row and column of the cell in the spreadsheet. For example, the value at 3 rows will be inverted into 5 columns and row at row 5 will invert to column 3 (vice versa). You can see in the following images:
This program is written with the help of nested for loop. First the data structure writes in the sheetData[x][y] for the cell at column x and row y then the newly created spreadsheet in the spreadData[y][x] for the cell at column y and row x.
Charts are effective way to represent the data. Using the charts, it visualizes the data which can be easily untestable. There are various types of chart: pie chart, line chart, bar chart, and so on. We can draw a chart on a spreadsheet using an openpyxl module.
For building any chart on the spreadsheet, we need to define the chart type like BarChart, LineChart, and so on. We also import reference which represents data that is used for the chart. It is important to define what data we want to represent on the chart. Let’s understand by the following example:
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
sheet = wb.active
# Let's create some sample student data
rows = [
["Serial_no", "Roll no", "Marks"],
[1, "0090011", 75],
[2, "0090012", 60],
[3, "0090013", 43],
[4, "0090014", 97],
[5, "0090015", 63],
[6, "0090016", 54],
[7, "0090017", 86],
]
for i in rows:
sheet.append(i)
chart = BarChart()
values = Reference(worksheet=sheet,
min_row=1,
max_row=8,
min_col=2,
max_col=3)
chart.add_data(values, titles_from_data=True)
sheet.add_chart(chart, "E2")
wb.save("student_chart.xlsx")
Output:
In the above code, we have created the sample data and drawn the bar chart corresponding to sample data.
Now we will create the line chart. Consider the following code:
import random
from openpyxl import Workbook
from openpyxl.chart import LineChart, Reference
wb = Workbook()
spreadsheet = wb.active
# Let's create some sample data
rows = [
["", "January", "February", "March", "April",
"May", "June", "July", "August", "September",
"October", "November", "December"],
[1, ],
[2, ],
[3, ],
]
for row in rows:
spreadsheet.append(row)
for row in spreadsheet.iter_rows(min_row=2,
max_row=4,
min_col=2,
max_col=13):
for cell in row:
cell.value = random.randrange(5, 100)
chart = LineChart()
data = Reference(worksheet=spreadsheet,
min_row=2,
max_row=4,
min_col=1,
max_col=13)
chart.add_data(data, from_rows=True, titles_from_data=True)
spreadsheet.add_chart(chart, "C6")
wb.save("line_chart1.xlsx")
Output:
In the above code, we are used from_rows = True as a parameter, it denotes chart plot row by row instead of the column by column.
Images are not generally used in a spreadsheet but sometimes we can use as per our requirement. We can use an image for the branding purposes or to make the spreadsheet more personal and attractive. For loading an image to spreadsheet, we need to install an additional module called pillow by the following command.
pip install pillow
In the following program, we are importing the image into the excel file.
from openpyxl import load_workbook
from openpyxl.drawing.image import Image
# Let's use the hello_world spreadsheet since it has less data
workbook = load_workbook(filename="student_chart1.xlsx")
sheet = workbook.active
logo = Image(r"C:\Users\DEVANSH SHARMA\Pictures\Screenshots\image.png")
# A bit of resizing to not fill the whole spreadsheet with the logo
logo.height = 150
logo.width = 150
sheet.add_image(logo, "E2")
workbook.save(filename="hello_world_logo1.xlsx")
In this tutorial, we have covered all basic and advance concept of openpyxl.
Thank you for reading! Please share if you liked it!
#python #openpyxl