Exploratory Data Analysis With Movies

Exploratory Data Analysis With Movies

An investigation into the metrics that make blockbuster and award winning films. Microsoft wants to enter into the movie industry, however they have no prior knowledge of the industry and they need help so that their movie studio can be successful.

As a part of the Flatiron School bootcamp requirements, we are required to complete a project at the end of each learning module that demonstrates our ability to apply what we’ve learned.

The prompt for the first project is as follows:

Microsoft wants to enter into the movie industry, however they have no prior knowledge of the industry and they need help so that their movie studio can be successful.

The primary skills that required to perform the exploratory data analysis (EDA) of the movie industry included: webscraping, storing and cleaning the data in a pandas dataframe, and visualization of data using seaborn and matplotlib. I’ll describe some of the methodology I used for webscraping and cleaning, and I’ll go through some of the recommendations we made in order to be successful as a movie studio.


I was unfamiliar with webscraping prior to the bootcamp, but I can say without a doubt it has been one of the most useful and fun skills that I have learned in the past few weeks. Web Scraping is essentially the process of looking at the HTML for a webpage and deconstructing that HTML so that you can extract pertinent information for analysis. By using the requests and Beautiful Soup libraries we can easily get all of the html into a Jupyter notebook and start picking apart the pieces. Some of the websites we used to develop recommendations were moviefone.comimdb.com, and boxofficemojo.com. For example, this page had movie release dates for movies released in 2019 so I ended up writing code like this:

movies_= requests.get("https://www.moviefone.com/movies/2019/?     page=1")
soup = BeautifulSoup(movie_dates_page.content,'lxml')
movie_title = soup.find_all("a", class_="hub-movie-title")

Then I simply use the .text method of each of the elements in the movie_title variable and I can get each of the movie titles on that webpage into a list. I use a similar method as the one shown above to get all of the release dates into a list. The two lists can then be put into a dataframe and the dates column can be manipulated using the datetime library so that we can count the number of movies released in a certain month or on a certain day. The construction of the dataframe would look something like this:

movie_dict = {'movies':movie_list, 'release_date':dates_list}
dates_df = pd.DataFrame(data=movie_dict)

#movie_list and dates_list are previously constructed lists from #webscraping

pandas exploratory-data-analysis data-science python flatiron-school

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Data Analysis | Data Analysis Projects | Data Science Projects | Exploratory Data Analysis | Pandas

In this tutorial, you will know about the TED TALKS DATA ANALYSIS project from scratch.

An introduction to exploratory data analysis in python

Many a time, I have seen beginners in data science skip exploratory data analysis (EDA) and jump straight into building a hypothesis function or model. In my opinion, this should not be the case.

Exploratory Data Analysis is a significant part of Data Science

Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.

Exploratory Data Analysis (EDA) From Scratch in Python

Exploratory data analysis is one of the best practices used in data science today. While starting a career in Data Science, people generally don’t know the difference between Data analysis and exploratory data analysis. There is not a very big difference between the two, but both have different purposes. Exploratory Data Analysis (EDA) From Scratch in Python