This article will be a visual representation of the data I collected, as written in my previous post.
For this visualisation, I will be using the pandas and plotly express libraries, which are imported into Jupyter Notebook using:
import pandas as pd
import plotly.express as px
Then, using the pd.read_csv function, I read the components of the csv file containing the required data.
movies = pd.read_csv('Bollywood_movies.csv')
Note: Once I had saved the data collected from BeautifulSoup into an excel sheet, I did a little bit of cleaning and sorting before exporting it into a csv file.
When called, the table looked like this.
Since the data is relatively small (9 columns and 25 entries), it kind of limited the visualisations I could do, of which I explored:
As seen in this bar graph, the highest grossing movie in Bollywood is Dangal which earned around ₹2,000 crore (roughly around $311 million), closely followed by Baahubali 2, which was at ₹1,800 crore ( $278 million). Dangal earned 50% more than the 3rd most grossing movie, Bajrangi Bhaijaan.
Though the numbers are huge, this doesn’t come as a surprise as Dangal has a fantastic story line which was inspirational to everyone and anyone who watched it (this comes with no bias what so ever).
Its release in 2016 was a hit in not only India, but also around the world. I was studying in China during that time, and I remember one of my close friends telling me that she had watched the movie with her mom (who had watched it 3 times within a span of a month)!
#plotly-express #python #data-science #visualisation #bollywood #data analysis