I needed to translate a PDF file containing text from English to Latvian. It turned out to be slightly more challenging than I initially thought, so I decided to write a tutorial to share what I learned and hopefully save some time for you. I have split my project into two parts.
This article is part one, and it focuses on how to read your PDF file, extract text, and translate it. It looks at two ways to translate the text — using Google Translate and AWS Translate.
Part 2 will look at how to create, format, and save a new PDF file from the obtained translation. You will find the link to my project in GitHub with the full code at the end of this article.
PyPDF2
library and extract text from PDFgoogletrans
library and AWS Translate.
#pdf #python #aws #google-translate #translation