Intro

Natural Language Processing (NLP) has been gaining tractions in recent years, allowing us to understand unstructured text data in a way that was never possible before. One of the promises of NLP is to use relevant techniques to detect fraud in companies and shed light on potential violations in the early phase.

About the dataset

I’ve only managed to find two earnings call transcripts online. And only one of

them is readable when converted from PDF to text. You can find the original

document here.

The earnings call transcript used in this article is from Enron’s conference call hold on November 14, 2001. Enron filed for bankruptcy on December 2, 2001.

Pre-processing the dataset

As you can see from the original Earnings, call PDF document, the document

is not digital and contains numbers in between the conversations.

Image for post

A snapshot of Enron’s earnings call in PDF format.

To pump the spoken sentences into R programming for analysis, I use Robotic Process Automation (RPA) to massage the text data into a more structured format. Below is a snapshot of the organized text data in CSV format.

#accounting #enron #data-science #fraud #naturallanguageprocessing

Analyze Enron’s Accounting Scandal With Natural Language Processing
2.25 GEEK