Natural Language Processing intrigued me from the beginning. I first heard about it at a presentation of an author predictor based on word patterns. It later was explained to me with song lyrics across the decades. So I sought to also use it historically. And where could I get a lot of data? Presidents: they love to talk! Or at least most did. So I created a NLP project on Presidential speeches in American history. It includes all 44 presidents from George Washington’s first inauguration in 1789 to speeches on Coronavirus at the end of April of 2020. Yes, I said 44 presidents as Grover Cleveland had two separate terms. Fair warning as you read along: I was an American Studies major and taught history for 13 years. So don’t mind me as I throw in some presidential knowledge along with my knowledge on Data Science. Enjoy!

Speech Organization

I obtained the majority of my speeches from UVA’s Miller Center. Their collection of speeches and other primary sources is considered top-notch, even being referenced by Harvard’s database. As I started cleaning up and analyzing my data, I realized some presidents should have had more speeches than were present in this collection. I could fix this for Truman and Eisenhower by adding in their missing State of the Union Speeches from the NLTK’s corpus. Every other president in this corpus had their SOU speeches already in the Miller Center collection. In total my analysis included 1018 speeches with approximately 23.8 million words. The math side of me did have to look at a few numbers and stats. The shortest speech came from George Washington’s Second Inaugural Address with 787 words. On the other hand, the longest speech goes to Harry Truman’s State of the Union address in 1946 at just shy of 170 thousand. He had to discuss such historic topics as the post-war economy, protection for veterans, the creation of the United Nations, communism concerns…and that doesn’t even cover the first half!

#history #data-science #president #nlp #data analysis

Analysis of Presidential Speeches throughout American History
1.25 GEEK