This post deals with the Capstone Project in the IBM Applied Data Science Capstone Course hosted on Coursera. The project makes use of the knowledge gained from the previous courses in the Specialization and applies that to solve a real world data driven problem.

1. INTRODUCTION

I am using the hypothetical scenario for a concept Indian Entrepreneur who wants to open an Indian Restaurant in New York City (NYC). It might present a good opportunity for an Indian American already living in NYC and are well versed with the Places and the Neighborhoods. As Indian cuisine is quite popular with Americans and Indian Americans alike, there are already many restaurants most of which are a Franchise or a family owned business.

Why New York City?

The New York City region is home to the largest Indian American population among metropolitan areas by a significant margin and represents the second-largest metropolitan Asian national diaspora both outside of Asia and within the New York City metropolitan area (source — Wikipedia).

Image for post

New York City is home to numerous Ethnic groups

Hospitality Industry

Ambience, menu, hygiene and of course taste are all important factors to be kept in mind before getting into the Hospitality Industry but these are all problems that can be tackled internally by the person(s) in charge. The location of a restaurant is also of utmost importance regardless of the history of a business or the taste of the food. If people don’t come in to eat then none of the preparations matter. That is the problem I am tackling in this project.

2. PROBLEM STATEMENT

The objective is to find a suitable location(s) to open an Indian Restaurant in New York City, USA. This project makes use of various Data Science and Machine Learning methodologies (k-means Clustering) to provide a Solution to the client. The project aims to provide a Solution to the Question : ‘Where should you consider opening an Indian Restaurant in New York City?’

3. DATA

I have used the following Data for the completion of the project :

  • List of Boroughs and Neighborhoods in NYC — This gives the coordinates of all the neighborhoods and is used to call the Foursquare API.
  • List of Places and Venues in NYC — This contains data about all the nearby venues like Restaurants, Bars, Gym etc.
  • Demographics of American Indians in New York City — Vital to understand the distribution of the target audience in NYC.
  • Latitude and Longitude Data of the neighborhood(s) — To plot and visualize our data.

Image for post

Boroughs and Neighborhoods in NYC

The Data Sources are linked at the end of the post.

4. METHODOLOGY

A) Boroughs

The data section above clearly describes that our NYC data consists of Boroughs (a town or district) and Neighborhoods in these Boroughs. The data contains 5 Boroughs — Queens, Brooklyn, Bronx, Manhattan and Staten Island and over 300 neighborhoods in total. So before we begin our analysis of the Neighborhoods we select an appropriate Borough. This involves looking into all 5 of them. The data is filtered for each Borough and is used to make the call to the Foursquare API.

Image for post

Count of Neighborhoods in the Boroughs

B) Foursquare API

Image for post
The data returned by the API for Brooklyn

The central part of this project involves making use of the Foursquare API to get various details of nearby venues, like — the Category (Pizza Place, Monument etc), The coordinates of the place (in Latitude and Longitude) and the Name of the Venue. We need to declare our Foursquare credentials like the Client ID and Client Secret. We assume a radius value of 500, which returns venues within a radius of half a kilometer. To prevent too many records being returned by the function call a limit of 100 is set.

The url is constructed with our declared credentials and a request call is made to the API. The data returned is in the form of a json payload. The pandas dataframe is then constructed by reading parts of this data. Therefore 5 data frames are made — one for each Borough

Now that the data has been structured for the preprocessing, we to decide on a Borough for the analysis and so we look into 2 aspects -

  1. Pre-existing Indian Restaurants
  2. Demographics of the Indian American Population

C) Pre-existing Indian Restaurants

Since we wish to open a new Indian Restaurant, it helps to look into ones that are already present. So we get the count of Indian Restaurants (from the Venue Category) in each Borough and merge them together to get an idea of the distribution or concentration of them. Logically, to avoid competition it would make sense to select a Borough with few Indian Restaurants.This post deals with the Capstone Project in the IBM Applied Data Science Capstone Course hosted on Coursera. The project makes use of the knowledge gained from the previous courses in the Specialization and applies that to solve a real world data driven problem.

#data-science #data-analysis #data-visualization #coursera #data analysis

Exploring New York City to Open an Indian Restaurant
3.65 GEEK