Taking a look at the neighborhoods of London and Paris through a Data science perspective. A lot has changed over the years and we now take a look at how the cities have grown.
A Tale of Two cities, a novel written by Charles Dickens was set in London and Paris which takes place during the French Revolution. These cities were both happening then and now. A lot has changed over the years and we now take a look at how the cities have grown.
London and Paris are quite the popular tourist and vacation destinations for people all around the world. They are diverse and multicultural and offer a wide variety of experiences that is widely sought after. We try to group the neighborhoods of London and Paris respectively and draw insights to what they look like now.
The aim is to help tourists choose their destinations depending on the experiences that the neighborhoods have to offer and what they would want to have. This also helps people make decisions if they are thinking about migrating to London or Paris or even if they want to relocate neighborhoods within the city. Our findings will help stakeholders make informed decisions and address any concerns they have including the different kinds of cuisines, provision stores and what the city has to offer.
We require geographical location data for both London and Paris. Postal codes in each city serve as a starting point. Using Postal codes we use can find out the neighborhoods, boroughs, venues and their most popular venue categories.
To derive our solution, We scrape our data from https://en.wikipedia.org/wiki/List_of_areas_of_London
This Wikipedia page has information about all the neighborhoods, we limit it London.
This Wikipedia page lacks information about the geographical locations. To solve this problem we use ArcGIS API.
ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups.
More specifically, we use _ArcGIS _to get the geographical locations of the neighborhoods of London. The following columns are added to our initial data set which prepares our data.
To derive our solution, We leverage JSON data available at https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e
The JSON file has data about all the neighborhoods in France, we limit it to Paris.
We will need data about different venues in different neighborhoods of that specific borough. In order to gain that information we will use “Foursquare” location information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.
After finding the list of neighborhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighborhood. For each neighborhood, we have chosen the radius to be 500 meters.
The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:
Based on all the information collected for both London and Paris, we have sufficient data to build our model. We cluster the neighborhoods together based on similar venue categories. We then present our observations and findings. Using this data, our stakeholders can take the necessary decision.
We will be creating our model with the help of Python so we start off by importing all the required packages. The code is available on GitHub to follow along.
import pandas as pd import requests import numpy as np import matplotlib.cm as cm import matplotlib.colors as colors import folium from sklearn.cluster import KMeans
The approach taken here is to explore each of the cities individually, plot the map to show the neighborhoods being considered and then build our model by clustering all of the similar neighborhoods together and finally plot the new map with the clustered neighborhoods. We draw insights and then compare and discuss our findings.
In the data collection stage, we begin with collecting the required data for the cities of London and Paris. We need data that has the postal codes, neighborhoods and boroughs specific to each of the cities.
To collect data for London, using pandas, we scrape the List of areas of London Wikipedia page to take the 2nd table:
url_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London" wiki_london_url = requests.get(url_london) wiki_london_data = pd.read_html(wiki_london_url.text) wiki_london_data = wiki_london_data wiki_london_data
The data looks like this:
Data for London scrapped from the Wikipedia page.
To collect data for Paris, we download the JSON file containing all the postal codes of France from https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e
Using Pandas we load the table after reading the JSON file:
!wget -q -O 'france-data.json' https://www.data.gouv.fr/fr/datasets/r/e88c6fda-1d09-42a0-a069-606d3259114e print("Data Downloaded!") paris_raw = pd.read_json('france-data.json') paris_raw.head()
JSON data containing postal codes of France
For London, We replace the spaces with underscores in the title.The borough column has numbers within square brackets that we remove using:
wiki_london_data.rename(columns=lambda x: x.strip().replace(" ", "_"), inplace=True) wiki_london_data['borough'] = wiki_london_data['borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
For Paris, we break down each of the nested fields and create the dataframe that we need:
paris_field_data = pd.DataFrame() for f in paris_raw.fields: dict_new = f paris_field_data = paris_field_data.append(dict_new, ignore_index=True) paris_field_data.head()
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Artificial Intelligence (AI) vs Machine Learning vs Deep Learning vs Data Science: Artificial intelligence is a field where set of techniques are used to make computers as smart as humans. Machine learning is a sub domain of artificial intelligence where set of statistical and neural network based algorithms are used for training a computer in doing a smart task. Deep learning is all about neural networks. Deep learning is considered to be a sub field of machine learning. Pytorch and Tensorflow are two popular frameworks that can be used in doing deep learning.
Artificial Intelligence, Machine Learning, and Data Science are amongst a few terms that have become extremely popular amongst professionals in almost all the fields.
Enroll now at CETPA, the best Institute in India for Artificial Intelligence Online Training Course and Certification for students & working professionals & avail 50% instant discount.
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.