Natural Language Processing (NLP) has been gaining tractions in recent years, allowing us to understand unstructured text data in a way that was never possible before. One of the promises of NLP is to use relevant techniques to detect fraud in companies and shed light on potential violations in the early phase.
I’ve only managed to find two earnings call transcripts online. And only one of
them is readable when converted from PDF to text. You can find the original
document here.
The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. Enron filed for bankruptcy on December 2, 2001.
As you can see from the original Earnings, call PDF document, the document
is not digital and contains numbers in between the conversations.
A snapshot of Enron’s earnings call in PDF format.
To pump the spoken sentences into R programming for analysis, I use Robotic Process Automation (RPA) to massage the text data into a more structured format. Below is a snapshot of the organized text data in CSV format.
#accounting #enron #data-science #fraud #naturallanguageprocessing