K-Means Clustering: Identifying Profitable Hotel Customers

In this instance, K-Means is used to analyse market segment clusters for a hotel in Portugal.

This analysis is based on the original study by Antonio, Almeida and Nunes as cited in the References section below.

Image for post

Given lead time (the period of time from when the customer makes their booking to when they actually stay at the hotel), along with ADR (average daily rate per customer), the k-means clustering algorithm is used to visually identify which market segments are most profitable for the hotel.

A customer with a high ADR and a low lead time is ideal, as it means that 1) the customer is paying a high daily rate which means a greater profit margin for the hotel, while a low lead time means that the customer pays for their booking quicker — which increases cash flow for the hotel in question.

Data Manipulation

The data is loaded and 100 samples are chosen at random:

df = pd.read_csv('H1full.csv')
df = df.sample(n = 100)

The interval (or continuous random variables) are of lead time and ADR are defined as below:

leadtime = df['LeadTime']
adr = df['ADR']

Variables with a categorical component are defined using ‘’’cat.codes’’’, in this case market segment.


The purpose of this is to assign categorical codes to each market segment. For instance, here is a snippet of some of the market segment entries in the dataset:

10871        Online TA
7752         Online TA
35566    Offline TA/TO
1353         Online TA
17532        Online TA
1312         Online TA
10364           Groups
16113           Direct
23633        Online TA
23406           Direct

Upon applying cat.codes, here are the corresponding categories.

10871    4
7752     4
35566    3
1353     4
17532    4
1312     4
10364    2
16113    1
23633    4
23406    1

The market segment labels are as follows:

  • 0 = Corporate
  • 1 = Direct
  • 2 = Groups
  • 3 = Offline TA/TO
  • 4 = Online TA

The lead time and ADR features are scaled using sklearn:

from sklearn.preprocessing import scale
X = scale(x1)

Here is a sample of X:

array([[ 1.07577693, -1.01441847],
       [-0.75329711,  2.25432473],
       [-0.60321924, -0.80994917],
       [-0.20926483,  0.26328418],
       [ 0.53174465, -0.40967609],
       [-0.82833604,  0.40156369],
       [-0.89399511, -1.01810593],
       [ 0.59740372,  1.40823851],
       [-0.89399511, -1.16560407],

K-Means Clustering

When it comes to choosing the number of clusters, one possible solution is to use what is called the elbow method. Here is an example of an elbow curve:

Image for post

This is a technique whereby the in-cluster variance for each cluster is calculated — the lower the variance, the tighter the cluster.

In this regard, as the score starts to flatten out, this means that the reduction in variance becomes less and less as we increase the number of clusters, which allows us to determine the ideal value for k.

However, this technique is not necessarily suitable for smaller clusters. Moreover, we already know the number of clusters (k=5) that we wish to define, as we already know the number of market segments that we wish to analyse.

Additionally, while k-means clustering methods may also use PCA (or Principal Dimensionality Reduction) to reduce the number of features, this is not appropriate in this case as the only two features being used (apart from market segment) are ADR and lead time.

#clustering #deep learning

What is GEEK

Buddha Community

K-Means Clustering: Identifying Profitable Hotel Customers
Elton  Bogan

Elton Bogan


SciPy Cluster - K-Means Clustering and Hierarchical Clustering

SciPy is the most efficient open-source library in python. The main purpose is to compute mathematical and scientific problems. There are many sub-packages in SciPy which further increases its functionality. This is a very important package for data interpretation. We can segregate clusters from the data set. We can perform clustering using a single or multi-cluster. Initially, we generate the data set. Then we perform clustering on the data set. Let us learn more SciPy Clusters.

K-means Clustering

It is a method that can employ to determine clusters and their center. We can use this process on the raw data set. We can define a cluster when the points inside the cluster have the minimum distance when we compare it to points outside the cluster. The k-means method operates in two steps, given an initial set of k-centers,

  • We define the cluster data points for the given cluster center. The points are such that they are closer to the cluster center than any other center.
  • We then calculate the mean for all the data points. The mean value then becomes the new cluster center.

The process iterates until the center value becomes constant. We then fix and assign the center value. The implementation of this process is very accurate using the SciPy library.

#numpy tutorials #clustering in scipy #k-means clustering in scipy #scipy clusters #numpy

Gerhard  Brink

Gerhard Brink


Understanding Core Data Science Algorithms: K-Means and K-Medoids Clustering

This article provides an overview of core data science algorithms used in statistical data analysis, specifically k-means and k-medoids clustering.

Clustering is one of the major techniques used for statistical data analysis.

As the term suggests, “clustering” is defined as the process of gathering similar objects into different groups or distribution of datasets into subsets with a defined distance measure.

K-means clustering is touted as a foundational algorithm every data scientist ought to have in their toolbox. The popularity of the algorithm in the data science industry is due to its extraordinary features:

  • Simplicity
  • Speed
  • Efficiency

#big data #big data analytics #k-means clustering #big data algorithms #k-means #data science algorithms

Elton  Bogan

Elton Bogan


Master KMeans clustering basics

Types of Clustering:

Clustering comes under the data mining topic and there is a lot of research going on in this field and there exist many clustering algorithms.

The following are the main types of clustering algorithms.

  1. K-Means
  2. Hierarchical clustering

Applications of Clustering:

Following are some of the applications of clustering

  1. Customer Segmentation: This is one of the most important use-cases of clustering in the sales and marketing domain. Here the aim is to group people or customers based on some similarities so that they can come up with different action items for the people in different groups. One example could be, amazon giving different offers to different people based on their buying patterns.
  2. Image Segmentation: Clustering is used in image segmentation where similar image pixels are grouped together. Pixels of different objects in the image are grouped together.

#machine-learning #k-means-clustering #clustering #k-means

Hertha  Mayer

Hertha Mayer


Authentication In MEAN Stack - A Quick Guide

I consider myself an active StackOverflow user, despite my activity tends to vary depending on my daily workload. I enjoy answering questions with angular tag and I always try to create some working example to prove correctness of my answers.

To create angular demo I usually use either plunker or stackblitz or even jsfiddle. I like all of them but when I run into some errors I want to have a little bit more usable tool to undestand what’s going on.

Many people who ask questions on stackoverflow don’t want to isolate the problem and prepare minimal reproduction so they usually post all code to their questions on SO. They also tend to be not accurate and make a lot of mistakes in template syntax. To not waste a lot of time investigating where the error comes from I tried to create a tool that will help me to quickly find what causes the problem.

Angular demo runner
Online angular editor for building demo.

Let me show what I mean…

Template parser errors#

There are template parser errors that can be easy catched by stackblitz

It gives me some information but I want the error to be highlighted

#mean stack #angular 6 passport authentication #authentication in mean stack #full stack authentication #mean stack example application #mean stack login and registration angular 8 #mean stack login and registration angular 9 #mean stack tutorial #mean stack tutorial 2019 #passport.js

Fynzo Survey

Fynzo Survey


Fynzo Customer Feedback Software For Cafes, Hotels, Saloons, Spa!

Customer Feedback Tool | Fynzo online customer feedback comes with Android, iOS app. Collect feedback from your customers with tablets or send them feedback links.

Visit page for more information: https://www.fynzo.com/feedback


#customer feedback system #powerful customer feedback system #free customer feedback tools #automated customer feedback system #customer feedback tools #customer rating system