Introduction

I believe Data Science allows me to express my curiosity in ways I’d never imagine. The coolest thing in Data Science is that I see data not as numbers but as an opportunity (business problem), insights(predictive modeling, stats, and data wrangling), and_ improvement__ (metrics). With this thought in mind, I decided to _analyze the YouTube Comments of VP and Presidential debates.

After getting mixed results from the news sources, I thought to analyze the Vice Presidential and Presidential debates using Data Science.

The idea is to use YouTube comments as a medium to get the sentiment regarding the debate and getting insights from the data. In this analysis, we plot common phrases,** common words**, we also analyze sentiment and in the end for all my data science practitioners I present them a** full-fledged dataset containing YouTube Comments of VP and Presidential debates**.

How and Why

Why: After getting mixed results from the news sources about the outcome of the debate, I decided to use data science to help me see the outcome of the result. With the elections around the corner, technology or to be precise analytics plays a key role in shaping our thoughts and supporting our hypothesis.

How: To Analyze YouTube Comments we use Python and various other NLP Libraries followed by some data visualization tools. We will use the wonders of the awesome data wrangling library known as Pandas and we hope to find some interesting insights.

Requirements

For this project we require:

  • Python 3.8
  • Pandas
  • Scikit-Learn
  • Numpy
  • Seaborn
  • NLTK
  • Wordcloud
  • TextBlob

Creation of the Dataset

The dataset contains YouTube comments on the most popular/watched VP and Presidential debates. We use the YouTube Data API to get all comments (Due to the size limitation, we only get 100 comments per video). The videos have been selected through careful examination by the author, to be precise we focused on the highest number of views and the highest number of YouTube comments.

def clean_text(string):
    string = re.sub(r'[^\w\s]', '', string) 
    return ''.join(i for i in string if ord(i) < 128)

def remove_stopwords(string):
    stop_words = set(stopwords.words('english')) 
    word_tokens = word_tokenize(string) 
    filtered_sentence = [w for w in word_tokens if not w in stop_words] 
    filtered_sentence = [] 

    for w in word_tokens: 
        if w not in stop_words: 
            filtered_sentence.append(w)
return ' '.join(filtered_sentence)

These functions define the cleaning of the text and removing stopwords.

#machine-learning #python #politics #data-science #data-analysis

Vice Presidential and Presidential Debate Analysis using Data Science
1.10 GEEK