Instacart Users Segmentation and Market Basket Analysis. Understanding the customer shopping behaviors of Instacart and make efficient recommendation
Covid-19 has been a world spread pandemic in 2020. Thus, New Yorkers follow the quarantine policy and keep a social distance from each other. As we know, the most popular mode of transportation in NYC is by subway. However, the subway is one of the most dangerous transportation tools to spread Covid-19, which may increase the risk of having Covid-19 for subway passengers. Therefore, going out to get daily needs became a headache issue for New Yorkers. People in grocery stores are not practical to keep a social distance. During Covid-19 outbreaks, New York City published a “staying at home” order, which increased the demand for online grocery shopping. Instacart is a grocery delivery platform that has experienced rapid growth during the Covid-19 crisis. Now, users gain the value of staying home to flatten the curve and to reduce their own risk of getting the virus.
The primary research goals are doing user segments based on time intervals and building a recommendation system based on product choices of users. The expectation of the research could optimize the Supply side’s inventory allocation and increase the probability that customers get essential goods without breaking the social distancing rule.
The primary data source is from Instacart’s 2017 anonymized customers’ orders over time (Stanley, 2017). It contains the order file, product file, order and product file, aisles file, and department file. Each entity in the dataset has an associated unique id.
In the order dataset, it contains user id, order id, order purchased day of the week(order_dow), order purchased hour of the day(order_hour_of_the_day), days since the last purchase(day_since_prior) and an indicator of the order’s belongs(eval_set). If it is a first time purchase, the days since the last purchase will be NaN. In the department dataset, it contains an unique department id and associated departments’ names. In the aisles dataset, it has aisle id and aisles’ names. In the product dataset, it contains the product id, the name of the product, the aisles’ id and the department id.
To make the time interval of user orders, we first divided user orders based on days. The data we used here is order.csv, column name ‘dow’. From Figure 1, The most popular days of user orders are days 0 and 1. After reviewing the data instruction, we did not find the definition of days 0 to 6. We believe the two busy days, 0 and 1, should be Sunday and Monday.
Learning is a new fun in the field of Machine Learning and Data Science. In this article, we’ll be discussing 15 machine learning and data science projects.
Most popular Data Science and Machine Learning courses — August 2020. This list was last updated in August 2020 — and will be updated regularly so as to keep it relevant
Why should you learn R programming when you're aiming to learn data science? Here are six reasons why R is the right language for you.
This post will help you in finding different websites where you can easily get free Datasets to practice and develop projects in Data Science and Machine Learning.
Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.