How to Extract Images From Pdf File using Python

In this tutorial, we will write a Python code to extract images from PDF files and save them in the local disk using PyMuPDF and Pillow libraries.

📌With PyMuPDF, you are able to access PDF, XPS, OpenXPS, epub and many other extensions. It should run on all platforms including Windows, Mac OSX and Linux.

Installation:

📌 First Open Command and enter the below command to install the PyMuPDF library.
● pip install PyMuPDF

Functions Used:

getImageList()
📌We’re using getImageList() method to list all available image objects as a list of tuples in that particular page. To get the image object index, we simply get the first element of the tuple returned.
extractImage()
📌we use the extractImage() method that returns the image in bytes along with additional information such as the image extension.
save()
📌Finally, we convert the image bytes to a PIL image instance and save it to the local disk using the save() method, which accepts a file pointer as an argument, we’re simply naming the images with their corresponding page and image indices.

Conclusion:

📌 Alright, we have successfully extracted images from that PDF file without losing image quality. For more information on how the library works, I suggest you take a look at the documentation.

📌 Documentation Link - https://pymupdf.readthedocs.io/en/latest/

Source Code and Link:
📌 Link - https://drive.google.com/drive/folders/1jdlzsieUmmeAej76x1vj2NQDE06BndWk

🔔 Subscribe: https://www.youtube.com/channel/UCNs6a3HlrbYw7dSUEXk9W3A

#python