This article will be a visual representation of the data I collected, as written in my previous post.

For this visualisation, I will be using the pandas and plotly express libraries, which are imported into Jupyter Notebook using:

import pandas as pd
import plotly.express as px

Then, using the pd.read_csv function, I read the components of the csv file containing the required data.

movies = pd.read_csv('Bollywood_movies.csv')

Note: Once I had saved the data collected from BeautifulSoup into an excel sheet, I did a little bit of cleaning and sorting before exporting it into a csv file.

When called, the table looked like this.

Image for post

Since the data is relatively small (9 columns and 25 entries), it kind of limited the visualisations I could do, of which I explored:

  • How many top grossing movies were released each year?
  • Were there any directors who had directed more than one top grossing movie?
  • Which languages were preferred?

1. Comparison of World Wide Gross of Movies

Image for post

As seen in this bar graph, the highest grossing movie in Bollywood is Dangal which earned around ₹2,000 crore (roughly around $311 million), closely followed by Baahubali 2, which was at ₹1,800 crore ( $278 million). Dangal earned 50% more than the 3rd most grossing movie, Bajrangi Bhaijaan.

Though the numbers are huge, this doesn’t come as a surprise as Dangal has a fantastic story line which was inspirational to everyone and anyone who watched it (this comes with no bias what so ever).

Its release in 2016 was a hit in not only India, but also around the world. I was studying in China during that time, and I remember one of my close friends telling me that she had watched the movie with her mom (who had watched it 3 times within a span of a month)!

#plotly-express #python #data-science #visualisation #bollywood #data analysis

Data Visualisation of Bollywood: Pt. 2
1.45 GEEK