Box Office Revenue Analysis and Visualization

Box Office Revenue Analysis and Visualization

Day 4 and 5 of 100 Days of Data Science.Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle. I’ll start by importing some useful libraries that we need in this task.

Welcome back to my 100 Days of Data Science Challenge Journey. On day 4 and 5, I work on TMDB Box Office Prediction Dataset available on Kaggle.

I’ll start by importing some useful libraries that we need in this task.

import pandas as pd

## for visualizations
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
plt.style.use('dark_background')

Data Loading and Exploration

Once you downloaded data from the Kaggle, you will have 3 files. As this is a prediction competition, you have train, test, and sample_submission file. For this project, my motive is only to perform data analysis and visuals. I am going to ignore test.csv and sample_submission.csv files.

%time train = pd.read_csv('./data/tmdb-box-office-prediction/train.csv')

## output
CPU times: user 258 ms, sys: 132 ms, total: 389 ms
Wall time: 403 ms

About the dataset:

id: Integer unique id of each movie

belongs_to_collection: Contains the TMDB Id, Name, Movie Poster, and Backdrop URL of a movie in JSON format.
budget: Budget of a movie in dollars. Some row contains 0 values, which mean unknown.
genres: Contains all the Genres Name & TMDB Id in JSON Format.
homepage: Contains the official URL of a movie.
imdb_id: IMDB id of a movie (string).
original_language: Two-digit code of the original language, in which the movie was made.
original_title: The original title of a movie in original_language.
overview: Brief description of the movie.
popularity: Popularity of the movie.
poster_path: Poster path of a movie. You can see full poster image by adding URL after this link → https://image.tmdb.org/t/p/original/
production_companies: All production company name and TMDB id in JSON format of a movie.
production_countries: Two-digit code and the full name of the production company in JSON format.
release_date: The release date of a movie in mm/dd/yy format.
runtime: Total runtime of a movie in minutes (Integer).
spoken_languages: Two-digit code and the full name of the spoken language.
status: Is the movie released or rumored?
tagline: Tagline of a movie
title: English title of a movie
Keywords: TMDB Id and name of all the keywords in JSON format.
cast: All cast TMDB id, name, character name, gender (1 = Female, 2 = Male) in JSON format
crew: Name, TMDB id, profile path of various kind of crew members job like Director, Writer, Art, Sound, etc.
revenue: Total revenue earned by a movie in dollars.

box-office data-analysis data-visualization data-science python

Bootstrap 5 Complete Course with Examples

Bootstrap 5 Tutorial - Bootstrap 5 Crash Course for Beginners

Nest.JS Tutorial for Beginners

Hello Vue 3: A First Look at Vue 3 and the Composition API

Building a simple Applications with Vue 3

Deno Crash Course: Explore Deno and Create a full REST API with Deno

How to Build a Real-time Chat App with Deno and WebSockets

Convert HTML to Markdown Online

HTML entity encoder decoder Online

Data Science With Python Training | Python Data Science Course | Intellipaat

🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...

Data Visualization With Python | Data Visualization | Python For Data Science

🔥To access the slide deck used in this session for Free, click here: https://bit.ly/GetPDF_DataV_P 🔥 Great Learning brings you this live session on 'Data Vis...

Python for Data Science | Data Science With Python | Python Data Science Tutorial

🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...

Applied Data Science with Python Certification Training Course -IgmGuru

Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now

An introduction to exploratory data analysis in python

Many a time, I have seen beginners in data science skip exploratory data analysis (EDA) and jump straight into building a hypothesis function or model. In my opinion, this should not be the case.