John David

John David


Introduction to Image Segmentation with K-Means clustering

Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what’s inside the image. For example, if we seek to find if there is a chair or person inside an indoor image, we may need image segmentation to separate objects and analyze each object individually to check what it is. Image segmentation usually serves as the pre-processing before pattern recognition, feature extraction, and compression of the image.

Image segmentation is the classification of an image into different groups. Many kinds of research have been done in the area of image segmentation using clustering. There are different methods and one of the most popular methods is K-Means clustering algorithm.

So here in this article, we will explore a method to read an image and cluster different regions of the image. But before doing lets first talk about:

  1. Image Segmentation
  2. How Image segmentation works
  3. K-Means clustering ML Algorithm
  4. Merge K-Means clustering Algorithm with Image Segmentation.
  5. Canny Edge detection

Image Segmentation

Introduction to Image Segmentation with K-Means clustering

Image segmentation is the process of partitioning a digital image into multiple distinct regions containing each pixel(sets of pixels, also known as superpixels) with similar attributes.

The goal of Image segmentation is to change the representation of an image into something that is more meaningful and easier to analyze.

Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, Image Segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Of course, a common question arises:

Why does Image Segmentation even matter?

If we take an example of Autonomous Vehicles, they need sensory input devices like cameras, radar, and lasers to allow the car to perceive the world around it, creating a digital map. Autonomous driving is not even possible without object detection which itself involves image classification/segmentation.

Introduction to Image Segmentation with K-Means clustering
Object detection and Image Classification by an Autonomous Vehicle

Other examples involve Healthcare Industry where if we talk about Cancer, even in today’s age of technological advancements, cancer can be fatal if we don’t identify it at an early stage. Detecting cancerous cell(s) as quickly as possible can potentially save millions of lives. The shape of the cancerous cells plays a vital role in determining the severity of cancer which can be identified using image classification algorithms.

Like this, there were several algorithms and techniques for image segmentation have been developed over the years using domain-specific knowledge to effectively solve segmentation problems in that specific application area which includes medical imaging, object detection, Iris recognition, video surveillance, machine vision and many more….

Let us plot an image in 3D space using python matplotlib library.

Below is the image that we’ll gonna plot in 3D space and we can clearly see 3 different colors which means 3 clusters/groups should be generated.
Introduction to Image Segmentation with K-Means clustering

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import cv2img = cv2.imread("/Users/nageshsinghchauhan/Documents/images10.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
r, g, b = cv2.split(img)
r = r.flatten()
g = g.flatten()
b = b.flatten()#plotting 
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(r, g, b)

Introduction to Image Segmentation with K-Means clustering

From the plot one can easily see that the data points are forming groups — some places in a graph are more dense, which we can think as different colors’ dominance on the image.

How Image Segmentation works

Image Segmentation involves converting an image into a collection of regions of pixels that are represented by a mask or a labeled image. By dividing an image into segments, you can process only the important segments of the image instead of processing the entire image.

A common technique is to look for abrupt discontinuities in pixel values, which typically indicate edges that define a region.

Another common approach is to detect similarities in the regions of an image. Some techniques that follow this approach are region growing, clustering, and thresholding.

A variety of other approaches to perform image segmentation have been developed over the years using domain-specific knowledge to effectively solve segmentation problems in specific application areas.

So let us start with one of the clustering-based approaches in Image Segmentation which is K-Means clustering.

K-Means clustering algorithm

Ok first What are Clustering algorithms in Machine Learning?

Clustering algorithms are unsupervised algorithms but are similar to Classification algorithms but the basis is different.

In Clustering, you don’t know what you are looking for, and you are trying to identify some segments or clusters in your data. When you use clustering algorithms in your dataset, unexpected things can suddenly pop-up like structures, clusters, and groupings you would have never thought otherwise.

K-Means clustering algorithm is an unsupervised algorithm and it is used to segment the interest area from the background. It clusters, or partitions the given data into K-clusters or parts based on the K-centroids.

The algorithm is used when you have unlabeled data(i.e. data without defined categories or groups). The goal is to find certain groups based on some kind of similarity in the data with the number of groups represented by K.
Introduction to Image Segmentation with K-Means clustering

In the above figure, Customers of a shopping mall have been grouped into 5 clusters based on their income and spending score. Yellow dots represent the Centroid of each cluster.

The objective of K-Means clustering is to minimize the sum of squared distances between all points and the cluster center.
Introduction to Image Segmentation with K-Means clustering

Steps in K-Means algorithm: 1. Choose the number of clusters K.
2. Select at random K points, the centroids(not necessarily from your dataset).
3. Assign each data point to the closest centroid → that forms K clusters.
4. Compute and place the new centroid of each cluster.
5. Reassign each data point to the new closest centroid. If any reassignment . took place, go to step 4, otherwise, the model is ready.

How to choose the optimal value of K?

For a certain class of clustering algorithms (in particular K-Means, K-medoids, and expectation-maximization algorithm), there is a parameter commonly referred to as K that specifies the number of clusters to detect. Other algorithms such as DBSCAN and OPTICS algorithm do not require the specification of this parameter; Hierarchical Clustering avoids the problem altogether but that’s beyond the scope of this article.

If we talk about K-Means then the correct choice of K is often ambiguous, with interpretations depending on the shape and scale of the distribution of points in a data set and the desired clustering resolution of the user. In addition, increasing K without penalty will always reduce the amount of error in the resulting clustering, to the extreme case of zero error if each data point is considered its own cluster (i.e., when K equals the number of data points, n). Intuitively then, the optimal choice of K will strike a balance between maximum compression of the data using a single cluster, and maximum accuracy by assigning each data point to its own cluster.

If an appropriate value of K is not apparent from prior knowledge of the properties of the data set, it must be chosen somehow. There are several categories of methods for making this decision and Elbow method is one such method.

Elbow method

The basic idea behind partitioning methods, such as K-Means clustering, is to define clusters such that the total intra-cluster variation or in other words, total within-cluster sum of square (WCSS) is minimized. The total WCSS measures the compactness of the clustering and we want it to be as small as possible.
Introduction to Image Segmentation with K-Means clustering

The Elbow method looks at the total WCSS as a function of the number of clusters: One should choose a number of clusters so that adding another cluster doesn’t improve much better the total WCSS.

Steps to choose the optimal number of clusters K:(Elbow Method) 1. Compute K-Means clustering for different values of K by varying K from 1 to 10 clusters.
2. For each K, calculate the total within-cluster sum of square (WCSS).
3. Plot the curve of WCSS vs the number of clusters K.
4. The location of a bend (knee) in the plot is generally considered as an indicator of the appropriate number of clusters.

There is a catch!!!

In spite of all the advantages K-Means have got but it fails sometimes due to the random choice of centroids which is called The Random Initialization Trap.

To solve this issue we have an initialization procedure for K-Means which is called K-Means++(Algorithm for choosing the initial values for K-Means clustering).

In K-Means++, We pick a point randomly and that’s your first centroid, then we pick the next point based on the probability that depends upon the distance of the first point, the further apart the point is the more probable it is.

Then we have two centroids, repeat the process, the probability of each point is based on its distance to the closest centroid to that point. Now, this introduces an overhead in the initialization of the algorithm, but it reduces the probability of a bad initialization leading to bad clustering result.

Visual Representation of K-Means Clustering: Starting with 4 leftmost points.
Introduction to Image Segmentation with K-Means clustering

Enough of theory lets implement what we have discussed in a real-world scenario.

In this section, we will explore a method to read an image and cluster different regions of the image using the K-Means clustering algorithm and OpenCV.

So basically we will perform Color clustering and Canny Edge detection.

Color Clustering:

Load all the required libraries:

import numpy as np
import cv2
import matplotlib.pyplot as plt

Next step is to load the image in RGB color space

original_image = cv2.imread("/Users/nageshsinghchauhan/Desktop/image1.jpg")

Original Image:
Introduction to Image Segmentation with K-Means clustering
We need to convert our image from RGB Colours Space to HSV to work ahead.

But the question is why ??

According to wikipedia the R, G, and B components of an object’s color in a digital image are all correlated with the amount of light hitting the object, and therefore with each other, image descriptions in terms of those components make object discrimination difficult. Descriptions in terms of hue/lightness/chroma or hue/lightness/saturation are often more relevant.

If you don’t convert your image to HSV, your image will look something like this:
Introduction to Image Segmentation with K-Means clustering


Next, converts the MxNx3 image into a Kx3 matrix where K=MxN and each row is now a vector in the 3-D space of RGB.

vectorized = img.reshape((-1,3))

We convert the unit8 values to float as it is a requirement of the k-means method of OpenCV.

vectorized = np.float32(vectorized)

We are going to cluster with k = 3 because if you look at the image above it has 3 colors, green-colored grass and forest, blue sea and the greenish-blue seashore.

Define criteria, number of clusters(K) and apply k-means()

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)

OpenCV provides cv2.kmeans(samples, nclusters(K), criteria, attempts, flags)function for color clustering.

  1. samples: It should be of np.float32 data type, and each feature should be put in a single column.

2. nclusters(K): Number of clusters required at the end

3. criteria: It is the iteration termination criteria. When this criterion is satisfied, the algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are ( type, max_iter, epsilon ):

Type of termination criteria. It has 3 flags as below:

  • cv.TERM_CRITERIA_EPS — stop the algorithm iteration if specified accuracy, epsilon, is reached.
  • cv.TERM_CRITERIA_MAX_ITER — stop the algorithm after the specified number of iterations, max_iter.
  • cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER — stop the iteration when any of the above condition is met.

4. attempts: Flag to specify the number of times the algorithm is executed using different initial labelings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output.

5. flags: This flag is used to specify how initial centers are taken. Normally two flags are used for this: cv.KMEANS_PP_CENTERS and cv.KMEANS_RANDOM_CENTERS.

K = 3

Now convert back into uint8.

center = np.uint8(center)

Next, we have to access the labels to regenerate the clustered image

res = center[label.flatten()]
result_image = res.reshape((img.shape))

result_image is the result of the frame which has undergone k-means clustering.

Now let us visualize the output result with K=3

figure_size = 15
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.title('Segmented Image when K = %i' % K), plt.xticks([]), plt.yticks([])

Introduction to Image Segmentation with K-Means clustering

So the algorithm has categorized our original image into three dominant colors.

Let’s see what happens when we change the value of K=5:
Introduction to Image Segmentation with K-Means clustering

Change the value of K=7:
Introduction to Image Segmentation with K-Means clustering

As you can see with an increase in the value of K, the image becomes clearer because the K-means algorithm can classify more classes/cluster of colors.

We can try our code for different images:
Introduction to Image Segmentation with K-Means clustering

Introduction to Image Segmentation with K-Means clustering

Let’s move to our next part which is Canny Edge detection.

Canny Edge detection: It is an image processing method used to detect edges in an image while suppressing noise.

The Canny Edge detection algorithm is composed of 5 steps: 1.Noise reduction
2. Gradient calculation
3. Non-maximum suppression
4. Double threshold
5. Edge Tracking by Hysteresis

OpenCV provides cv2.Canny(image, threshold1,threshold2) function for edge detection.

The first argument is our input image. Second and third arguments are our min and max threshold respectively.

The function finds edges in the input image(8-bit input image) and marks them in the output map edges using the Canny algorithm. The smallest value between threshold1 and threshold2 is used for edge linking. The largest value is used to find initial segments of strong edges.

edges = cv2.Canny(img,150,200)
plt.title('Original Image'), plt.xticks([]), plt.yticks([])
plt.subplot(1,2,2),plt.imshow(edges,cmap = 'gray')
plt.title('Edge Image'), plt.xticks([]), plt.yticks([])

Introduction to Image Segmentation with K-Means clustering

Result-1: Edge detection using the Canny algorithm ![Introduction to Image Segmentation with K-Means clustering](*3tObJxCcj8NZkjplOjgeiA.png "Introduction to Image Segmentation with K-Means clustering")

Result-2: Edge detection using the Canny algorithm

Conclusion: What the future holds

Due to advancements in Image processing, Machine learning, AI and related technologies, there will be millions of robots in the world in a few decades time, transforming the way we live our daily lives. These advancements will involve spoken commands, anticipating the information requirements of governments, translating languages, recognizing and tracking people and things, diagnosing medical conditions, performing surgery, reprogramming defects in human DNA, driverless cars and many more applications, the count of real-life applications is endless.

Well, this comes to the end of this article. I hope you guys have enjoyed reading this article. Share your thoughts/comments/doubts in the comment section.

Thanks for reading !!!

#machine-learning #data-science

What is GEEK

Buddha Community

Introduction to Image Segmentation with K-Means clustering
Queenie  Davis

Queenie Davis


EasyMDE: Simple, Beautiful and Embeddable JavaScript Markdown Editor

EasyMDE - Markdown Editor 

This repository is a fork of SimpleMDE, made by Sparksuite. Go to the dedicated section for more information.

A drop-in JavaScript text area replacement for writing beautiful and understandable Markdown. EasyMDE allows users who may be less experienced with Markdown to use familiar toolbar buttons and shortcuts.

In addition, the syntax is rendered while editing to clearly show the expected result. Headings are larger, emphasized words are italicized, links are underlined, etc.

EasyMDE also features both built-in auto saving and spell checking. The editor is entirely customizable, from theming to toolbar buttons and javascript hooks.

Try the demo


Quick access

Install EasyMDE

Via npm:

npm install easymde

Via the UNPKG CDN:

<link rel="stylesheet" href="">
<script src=""></script>

Or jsDelivr:

<link rel="stylesheet" href="">
<script src=""></script>

How to use

Loading the editor

After installing and/or importing the module, you can load EasyMDE onto the first textarea element on the web page:

const easyMDE = new EasyMDE();

Alternatively you can select a specific textarea, via JavaScript:

<textarea id="my-text-area"></textarea>
const easyMDE = new EasyMDE({element: document.getElementById('my-text-area')});

Editor functions

Use easyMDE.value() to get the content of the editor:


Use easyMDE.value(val) to set the content of the editor:

easyMDE.value('New input for **EasyMDE**');


Options list

  • autoDownloadFontAwesome: If set to true, force downloads Font Awesome (used for icons). If set to false, prevents downloading. Defaults to undefined, which will intelligently check whether Font Awesome has already been included, then download accordingly.
  • autofocus: If set to true, focuses the editor automatically. Defaults to false.
  • autosave: Saves the text that's being written and will load it back in the future. It will forget the text when the form it's contained in is submitted.
    • enabled: If set to true, saves the text automatically. Defaults to false.
    • delay: Delay between saves, in milliseconds. Defaults to 10000 (10 seconds).
    • submit_delay: Delay before assuming that submit of the form failed and saving the text, in milliseconds. Defaults to autosave.delay or 10000 (10 seconds).
    • uniqueId: You must set a unique string identifier so that EasyMDE can autosave. Something that separates this from other instances of EasyMDE elsewhere on your website.
    • timeFormat: Set DateTimeFormat. More information see DateTimeFormat instances. Default locale: en-US, format: hour:minute.
    • text: Set text for autosave.
  • autoRefresh: Useful, when initializing the editor in a hidden DOM node. If set to { delay: 300 }, it will check every 300 ms if the editor is visible and if positive, call CodeMirror's refresh().
  • blockStyles: Customize how certain buttons that style blocks of text behave.
    • bold: Can be set to ** or __. Defaults to **.
    • code: Can be set to ``` or ~~~. Defaults to ```.
    • italic: Can be set to * or _. Defaults to *.
  • unorderedListStyle: can be *, - or +. Defaults to *.
  • scrollbarStyle: Chooses a scrollbar implementation. The default is "native", showing native scrollbars. The core library also provides the "null" style, which completely hides the scrollbars. Addons can implement additional scrollbar models.
  • element: The DOM element for the textarea element to use. Defaults to the first textarea element on the page.
  • forceSync: If set to true, force text changes made in EasyMDE to be immediately stored in original text area. Defaults to false.
  • hideIcons: An array of icon names to hide. Can be used to hide specific icons shown by default without completely customizing the toolbar.
  • indentWithTabs: If set to false, indent using spaces instead of tabs. Defaults to true.
  • initialValue: If set, will customize the initial value of the editor.
  • previewImagesInEditor: - EasyMDE will show preview of images, false by default, preview for images will appear only for images on separate lines.
  • imagesPreviewHandler: - A custom function for handling the preview of images. Takes the parsed string between the parantheses of the image markdown ![]( ) as argument and returns a string that serves as the src attribute of the <img> tag in the preview. Enables dynamic previewing of images in the frontend without having to upload them to a server, allows copy-pasting of images to the editor with preview.
  • insertTexts: Customize how certain buttons that insert text behave. Takes an array with two elements. The first element will be the text inserted before the cursor or highlight, and the second element will be inserted after. For example, this is the default link value: ["[", "](http://)"].
    • horizontalRule
    • image
    • link
    • table
  • lineNumbers: If set to true, enables line numbers in the editor.
  • lineWrapping: If set to false, disable line wrapping. Defaults to true.
  • minHeight: Sets the minimum height for the composition area, before it starts auto-growing. Should be a string containing a valid CSS value like "500px". Defaults to "300px".
  • maxHeight: Sets fixed height for the composition area. minHeight option will be ignored. Should be a string containing a valid CSS value like "500px". Defaults to undefined.
  • onToggleFullScreen: A function that gets called when the editor's full screen mode is toggled. The function will be passed a boolean as parameter, true when the editor is currently going into full screen mode, or false.
  • parsingConfig: Adjust settings for parsing the Markdown during editing (not previewing).
    • allowAtxHeaderWithoutSpace: If set to true, will render headers without a space after the #. Defaults to false.
    • strikethrough: If set to false, will not process GFM strikethrough syntax. Defaults to true.
    • underscoresBreakWords: If set to true, let underscores be a delimiter for separating words. Defaults to false.
  • overlayMode: Pass a custom codemirror overlay mode to parse and style the Markdown during editing.
    • mode: A codemirror mode object.
    • combine: If set to false, will replace CSS classes returned by the default Markdown mode. Otherwise the classes returned by the custom mode will be combined with the classes returned by the default mode. Defaults to true.
  • placeholder: If set, displays a custom placeholder message.
  • previewClass: A string or array of strings that will be applied to the preview screen when activated. Defaults to "editor-preview".
  • previewRender: Custom function for parsing the plaintext Markdown and returning HTML. Used when user previews.
  • promptURLs: If set to true, a JS alert window appears asking for the link or image URL. Defaults to false.
  • promptTexts: Customize the text used to prompt for URLs.
    • image: The text to use when prompting for an image's URL. Defaults to URL of the image:.
    • link: The text to use when prompting for a link's URL. Defaults to URL for the link:.
  • uploadImage: If set to true, enables the image upload functionality, which can be triggered by drag and drop, copy-paste and through the browse-file window (opened when the user click on the upload-image icon). Defaults to false.
  • imageMaxSize: Maximum image size in bytes, checked before upload (note: never trust client, always check the image size at server-side). Defaults to 1024 * 1024 * 2 (2 MB).
  • imageAccept: A comma-separated list of mime-types used to check image type before upload (note: never trust client, always check file types at server-side). Defaults to image/png, image/jpeg.
  • imageUploadFunction: A custom function for handling the image upload. Using this function will render the options imageMaxSize, imageAccept, imageUploadEndpoint and imageCSRFToken ineffective.
    • The function gets a file and onSuccess and onError callback functions as parameters. onSuccess(imageUrl: string) and onError(errorMessage: string)
  • imageUploadEndpoint: The endpoint where the images data will be sent, via an asynchronous POST request. The server is supposed to save this image, and return a JSON response.
    • if the request was successfully processed (HTTP 200 OK): {"data": {"filePath": "<filePath>"}} where filePath is the path of the image (absolute if imagePathAbsolute is set to true, relative if otherwise);
    • otherwise: {"error": "<errorCode>"}, where errorCode can be noFileGiven (HTTP 400 Bad Request), typeNotAllowed (HTTP 415 Unsupported Media Type), fileTooLarge (HTTP 413 Payload Too Large) or importError (see errorMessages below). If errorCode is not one of the errorMessages, it is alerted unchanged to the user. This allows for server-side error messages. No default value.
  • imagePathAbsolute: If set to true, will treat imageUrl from imageUploadFunction and filePath returned from imageUploadEndpoint as an absolute rather than relative path, i.e. not prepend window.location.origin to it.
  • imageCSRFToken: CSRF token to include with AJAX call to upload image. For various instances like Django, Spring and Laravel.
  • imageCSRFName: CSRF token filed name to include with AJAX call to upload image, applied when imageCSRFToken has value, defaults to csrfmiddlewaretoken.
  • imageCSRFHeader: If set to true, passing CSRF token via header. Defaults to false, which pass CSRF through request body.
  • imageTexts: Texts displayed to the user (mainly on the status bar) for the import image feature, where #image_name#, #image_size# and #image_max_size# will replaced by their respective values, that can be used for customization or internationalization:
    • sbInit: Status message displayed initially if uploadImage is set to true. Defaults to Attach files by drag and dropping or pasting from clipboard..
    • sbOnDragEnter: Status message displayed when the user drags a file to the text area. Defaults to Drop image to upload it..
    • sbOnDrop: Status message displayed when the user drops a file in the text area. Defaults to Uploading images #images_names#.
    • sbProgress: Status message displayed to show uploading progress. Defaults to Uploading #file_name#: #progress#%.
    • sbOnUploaded: Status message displayed when the image has been uploaded. Defaults to Uploaded #image_name#.
    • sizeUnits: A comma-separated list of units used to display messages with human-readable file sizes. Defaults to B, KB, MB (example: 218 KB). You can use B,KB,MB instead if you prefer without whitespaces (218KB).
  • errorMessages: Errors displayed to the user, using the errorCallback option, where #image_name#, #image_size# and #image_max_size# will replaced by their respective values, that can be used for customization or internationalization:
    • noFileGiven: The server did not receive any file from the user. Defaults to You must select a file..
    • typeNotAllowed: The user send a file type which doesn't match the imageAccept list, or the server returned this error code. Defaults to This image type is not allowed..
    • fileTooLarge: The size of the image being imported is bigger than the imageMaxSize, or if the server returned this error code. Defaults to Image #image_name# is too big (#image_size#).\nMaximum file size is #image_max_size#..
    • importError: An unexpected error occurred when uploading the image. Defaults to Something went wrong when uploading the image #image_name#..
  • errorCallback: A callback function used to define how to display an error message. Defaults to (errorMessage) => alert(errorMessage).
  • renderingConfig: Adjust settings for parsing the Markdown during previewing (not editing).
    • codeSyntaxHighlighting: If set to true, will highlight using highlight.js. Defaults to false. To use this feature you must include highlight.js on your page or pass in using the hljs option. For example, include the script and the CSS files like:
      <script src=""></script>
      <link rel="stylesheet" href="">
    • hljs: An injectible instance of highlight.js. If you don't want to rely on the global namespace (window.hljs), you can provide an instance here. Defaults to undefined.
    • markedOptions: Set the internal Markdown renderer's options. Other renderingConfig options will take precedence.
    • singleLineBreaks: If set to false, disable parsing GitHub Flavored Markdown (GFM) single line breaks. Defaults to true.
    • sanitizerFunction: Custom function for sanitizing the HTML output of Markdown renderer.
  • shortcuts: Keyboard shortcuts associated with this instance. Defaults to the array of shortcuts.
  • showIcons: An array of icon names to show. Can be used to show specific icons hidden by default without completely customizing the toolbar.
  • spellChecker: If set to false, disable the spell checker. Defaults to true. Optionally pass a CodeMirrorSpellChecker-compliant function.
  • inputStyle: textarea or contenteditable. Defaults to textarea for desktop and contenteditable for mobile. contenteditable option is necessary to enable nativeSpellcheck.
  • nativeSpellcheck: If set to false, disable native spell checker. Defaults to true.
  • sideBySideFullscreen: If set to false, allows side-by-side editing without going into fullscreen. Defaults to true.
  • status: If set to false, hide the status bar. Defaults to the array of built-in status bar items.
    • Optionally, you can set an array of status bar items to include, and in what order. You can even define your own custom status bar items.
  • styleSelectedText: If set to false, remove the CodeMirror-selectedtext class from selected lines. Defaults to true.
  • syncSideBySidePreviewScroll: If set to false, disable syncing scroll in side by side mode. Defaults to true.
  • tabSize: If set, customize the tab size. Defaults to 2.
  • theme: Override the theme. Defaults to easymde.
  • toolbar: If set to false, hide the toolbar. Defaults to the array of icons.
  • toolbarTips: If set to false, disable toolbar button tips. Defaults to true.
  • direction: rtl or ltr. Changes text direction to support right-to-left languages. Defaults to ltr.

Options example

Most options demonstrate the non-default behavior:

const editor = new EasyMDE({
    autofocus: true,
    autosave: {
        enabled: true,
        uniqueId: "MyUniqueID",
        delay: 1000,
        submit_delay: 5000,
        timeFormat: {
            locale: 'en-US',
            format: {
                year: 'numeric',
                month: 'long',
                day: '2-digit',
                hour: '2-digit',
                minute: '2-digit',
        text: "Autosaved: "
    blockStyles: {
        bold: "__",
        italic: "_",
    unorderedListStyle: "-",
    element: document.getElementById("MyID"),
    forceSync: true,
    hideIcons: ["guide", "heading"],
    indentWithTabs: false,
    initialValue: "Hello world!",
    insertTexts: {
        horizontalRule: ["", "\n\n-----\n\n"],
        image: ["![](http://", ")"],
        link: ["[", "](https://)"],
        table: ["", "\n\n| Column 1 | Column 2 | Column 3 |\n| -------- | -------- | -------- |\n| Text     | Text      | Text     |\n\n"],
    lineWrapping: false,
    minHeight: "500px",
    parsingConfig: {
        allowAtxHeaderWithoutSpace: true,
        strikethrough: false,
        underscoresBreakWords: true,
    placeholder: "Type here...",

    previewClass: "my-custom-styling",
    previewClass: ["my-custom-styling", "more-custom-styling"],

    previewRender: (plainText) => customMarkdownParser(plainText), // Returns HTML from a custom parser
    previewRender: (plainText, preview) => { // Async method
        setTimeout(() => {
            preview.innerHTML = customMarkdownParser(plainText);
        }, 250);

        return "Loading...";
    promptURLs: true,
    promptTexts: {
        image: "Custom prompt for URL:",
        link: "Custom prompt for URL:",
    renderingConfig: {
        singleLineBreaks: false,
        codeSyntaxHighlighting: true,
        sanitizerFunction: (renderedHTML) => {
            // Using DOMPurify and only allowing <b> tags
            return DOMPurify.sanitize(renderedHTML, {ALLOWED_TAGS: ['b']})
    shortcuts: {
        drawTable: "Cmd-Alt-T"
    showIcons: ["code", "table"],
    spellChecker: false,
    status: false,
    status: ["autosave", "lines", "words", "cursor"], // Optional usage
    status: ["autosave", "lines", "words", "cursor", {
        className: "keystrokes",
        defaultValue: (el) => {
            el.setAttribute('data-keystrokes', 0);
        onUpdate: (el) => {
            const keystrokes = Number(el.getAttribute('data-keystrokes')) + 1;
            el.innerHTML = `${keystrokes} Keystrokes`;
            el.setAttribute('data-keystrokes', keystrokes);
    }], // Another optional usage, with a custom status bar item that counts keystrokes
    styleSelectedText: false,
    sideBySideFullscreen: false,
    syncSideBySidePreviewScroll: false,
    tabSize: 4,
    toolbar: false,
    toolbarTips: false,

Toolbar icons

Below are the built-in toolbar icons (only some of which are enabled by default), which can be reorganized however you like. "Name" is the name of the icon, referenced in the JavaScript. "Action" is either a function or a URL to open. "Class" is the class given to the icon. "Tooltip" is the small tooltip that appears via the title="" attribute. Note that shortcut hints are added automatically and reflect the specified action if it has a key bind assigned to it (i.e. with the value of action set to bold and that of tooltip set to Bold, the final text the user will see would be "Bold (Ctrl-B)").

Additionally, you can add a separator between any icons by adding "|" to the toolbar array.

fa fa-bold
fa fa-italic
fa fa-strikethrough
fa fa-header
heading-smallertoggleHeadingSmallerSmaller Heading
fa fa-header
heading-biggertoggleHeadingBiggerBigger Heading
fa fa-lg fa-header
heading-1toggleHeading1Big Heading
fa fa-header header-1
heading-2toggleHeading2Medium Heading
fa fa-header header-2
heading-3toggleHeading3Small Heading
fa fa-header header-3
fa fa-code
fa fa-quote-left
unordered-listtoggleUnorderedListGeneric List
fa fa-list-ul
ordered-listtoggleOrderedListNumbered List
fa fa-list-ol
clean-blockcleanBlockClean block
fa fa-eraser
linkdrawLinkCreate Link
fa fa-link
imagedrawImageInsert Image
fa fa-picture-o
tabledrawTableInsert Table
fa fa-table
horizontal-ruledrawHorizontalRuleInsert Horizontal Line
fa fa-minus
previewtogglePreviewToggle Preview
fa fa-eye no-disable
side-by-sidetoggleSideBySideToggle Side by Side
fa fa-columns no-disable no-mobile
fullscreentoggleFullScreenToggle Fullscreen
fa fa-arrows-alt no-disable no-mobile
guideThis linkMarkdown Guide
fa fa-question-circle
fa fa-undo
fa fa-redo

Toolbar customization

Customize the toolbar using the toolbar option.

Only the order of existing buttons:

const easyMDE = new EasyMDE({
    toolbar: ["bold", "italic", "heading", "|", "quote"]

All information and/or add your own icons

const easyMDE = new EasyMDE({
    toolbar: [
            name: "bold",
            action: EasyMDE.toggleBold,
            className: "fa fa-bold",
            title: "Bold",
        "italics", // shortcut to pre-made button
            name: "custom",
            action: (editor) => {
                // Add your own code
            className: "fa fa-star",
            title: "Custom Button",
            attributes: { // for custom attributes
                id: "custom-id",
                "data-value": "custom value" // HTML5 data-* attributes need to be enclosed in quotation marks ("") because of the dash (-) in its name.
        "|" // Separator
        // [, ...]

Put some buttons on dropdown menu

const easyMDE = new EasyMDE({
    toolbar: [{
                name: "heading",
                action: EasyMDE.toggleHeadingSmaller,
                className: "fa fa-header",
                title: "Headers",
                name: "others",
                className: "fa fa-blind",
                title: "others buttons",
                children: [
                        name: "image",
                        action: EasyMDE.drawImage,
                        className: "fa fa-picture-o",
                        title: "Image",
                        name: "quote",
                        action: EasyMDE.toggleBlockquote,
                        className: "fa fa-percent",
                        title: "Quote",
                        name: "link",
                        action: EasyMDE.drawLink,
                        className: "fa fa-link",
                        title: "Link",
        // [, ...]

Keyboard shortcuts

EasyMDE comes with an array of predefined keyboard shortcuts, but they can be altered with a configuration option. The list of default ones is as follows:

Shortcut (Windows / Linux)Shortcut (macOS)Action

Here is how you can change a few, while leaving others untouched:

const editor = new EasyMDE({
    shortcuts: {
        "toggleOrderedList": "Ctrl-Alt-K", // alter the shortcut for toggleOrderedList
        "toggleCodeBlock": null, // unbind Ctrl-Alt-C
        "drawTable": "Cmd-Alt-T", // bind Cmd-Alt-T to drawTable action, which doesn't come with a default shortcut

Shortcuts are automatically converted between platforms. If you define a shortcut as "Cmd-B", on PC that shortcut will be changed to "Ctrl-B". Conversely, a shortcut defined as "Ctrl-B" will become "Cmd-B" for Mac users.

The list of actions that can be bound is the same as the list of built-in actions available for toolbar buttons.

Advanced use

Event handling

You can catch the following list of events:

const easyMDE = new EasyMDE();
easyMDE.codemirror.on("change", () => {

Removing EasyMDE from text area

You can revert to the initial text area by calling the toTextArea method. Note that this clears up the autosave (if enabled) associated with it. The text area will retain any text from the destroyed EasyMDE instance.

const easyMDE = new EasyMDE();
// ...
easyMDE = null;

If you need to remove registered event listeners (when the editor is not needed anymore), call easyMDE.cleanup().

Useful methods

The following self-explanatory methods may be of use while developing with EasyMDE.

const easyMDE = new EasyMDE();
easyMDE.isPreviewActive(); // returns boolean
easyMDE.isSideBySideActive(); // returns boolean
easyMDE.isFullscreenActive(); // returns boolean
easyMDE.clearAutosavedValue(); // no returned value

How it works

EasyMDE is a continuation of SimpleMDE.

SimpleMDE began as an improvement of lepture's Editor project, but has now taken on an identity of its own. It is bundled with CodeMirror and depends on Font Awesome.

CodeMirror is the backbone of the project and parses much of the Markdown syntax as it's being written. This allows us to add styles to the Markdown that's being written. Additionally, a toolbar and status bar have been added to the top and bottom, respectively. Previews are rendered by Marked using GitHub Flavored Markdown (GFM).

SimpleMDE fork

I originally made this fork to implement FontAwesome 5 compatibility into SimpleMDE. When that was done I submitted a pull request, which has not been accepted yet. This, and the project being inactive since May 2017, triggered me to make more changes and try to put new life into the project.

Changes include:

  • FontAwesome 5 compatibility
  • Guide button works when editor is in preview mode
  • Links are now https:// by default
  • Small styling changes
  • Support for Node 8 and beyond
  • Lots of refactored code
  • Links in preview will open in a new tab by default
  • TypeScript support

My intention is to continue development on this project, improving it and keeping it alive.

Hacking EasyMDE

You may want to edit this library to adapt its behavior to your needs. This can be done in some quick steps:

  1. Follow the prerequisites and installation instructions in the contribution guide;
  2. Do your changes;
  3. Run gulp command, which will generate files: dist/easymde.min.css and dist/easymde.min.js;
  4. Copy-paste those files to your code base, and you are done.


Want to contribute to EasyMDE? Thank you! We have a contribution guide just for you!

Author: Ionaru
Source Code:
License: MIT license

#react-native #react 

Customer Segmentation: K-Means Clustering & A/B Testing


I have been working in Advertising, specifically Digital Media and Performance, for nearly 3 years and customer behaviour analysis is one of the core concentrations in my day-to-day job. With the help of different analytics platforms (e.g. Google Analytics, Adobe Analytics), my life has been made easier than before since these platforms come with the built-in function of segmentation that analyses user behaviours across dimensions and metrics.

However, despite the convenience provided, I was hoping to leverage Machine Learning to do customer segmentation that can be scalable and applicable to other optimizations in Data Science (e.g. A/B Testing). Then, I came across the dataset provided by Google Analytics for a Kaggle competition and decided to use it for this project.

Feel free to check out the dataset here if you’re keen! Beware that the dataset has several sub-datasets and each has more than 900k rows!

A. Explanatory Data Analysis (EDA)

This always remain an essential step in every Data Science project to ensure the dataset is clean and properly pre-processed to be used for modelling.

First of all, let’s import all the necessary libraries and read the csv file:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
df_raw = pd.read_csv("google-analytics.csv")

Image for post

1. Flatten JSON Fields

As you can see, the raw dataset above is a bit “messy” and not digestible at all since some variables are formatted as JSON fields which compress different values of different sub-variables into one field. For example, for geoNetwork variable, we can tell that there are several sub-variables such as continent, subContinent, etc. that are grouped together.

Thanks to the help of a Kaggler, I was able to convert these variables to a more digestible ones by flattening those JSON fields:

import os
import json
from pandas import json_normalize

def load_df(csv_path="google-analytics.csv", nrows=None):
    json_columns = ['device', 'geoNetwork', 'totals', 'trafficSource']
    df = pd.read_csv(csv_path, converters={column: json.loads for column in json_columns},dtype={'fullVisitorID':'str'}, nrows=nrows)
    for column in json_columns:
        column_converted = json_normalize(df[column])
        column_converted.columns = [f"{column}_{subcolumn}" for subcolumn in column_converted.columns]
        df = df.drop(column, axis=1).merge(column_converted, right_index=True, left_index=True)
    return df

Image for post

After flattening those JSON fields, we are able to see a much cleaner dataset, especially those JSON variables split into sub-variables (e.g. device split into device_browser, device_browserVersion, etc.).

2. Data Re-formatting & Grouping

For this project, I have chosen the variables that I believe have better impact or correlation to the user behaviours:

df = df.loc[:,['channelGrouping', 'date', 'fullVisitorId', 'sessionId', 'visitId', 'visitNumber', 'device_browser', 'device_operatingSystem', 'device_isMobile', 'geoNetwork_country', 'trafficSource_source', 'totals_visits', 'totals_hits', 'totals_pageviews', 'totals_bounces', 'totals_transactionRevenue']]
df = df.fillna(value=0)

Image for post

Moving on, as the new dataset has fewer variables which, however, vary in terms of data type, I took some time to analyze each and every variable to ensure the data is “clean enough” prior to modelling. Below are some quick examples of un-clean data to be cleaned:

#Format the values
df.channelGrouping = df.channelGrouping.replace("(Other)", "Others")

#Convert boolean type to string 
df.device_isMobile = df.device_isMobile.astype(str)
df.loc[df.device_isMobile == "False", "device"] = "Desktop"
df.loc[df.device_isMobile == "True", "device"] = "Mobile"
#Categorize similar values
df['traffic_source'] = df.trafficSource_source
main_traffic_source = ["google","baidu","bing","yahoo",...., "pinterest","yandex"]
df.traffic_source[df.traffic_source.str.contains("google")] = "google"
df.traffic_source[df.traffic_source.str.contains("baidu")] = "baidu"
df.traffic_source[df.traffic_source.str.contains("bing")] = "bing"
df.traffic_source[df.traffic_source.str.contains("yahoo")] = "yahoo"
df.traffic_source[~df.traffic_source.isin(main_traffic_source)] = "Others"

After re-formatting, I found that fullVisitorID’s unique values are fewer than the total rows of the dataset, meaning there are multiple fullVisitorIDs that were recorded. Hence, I proceeded to group the variables by fullVisitorID and sort by Revenue:

df_groupby = df.groupby(['fullVisitorId', 'channelGrouping', 'geoNetwork_country', 'traffic_source', 'device', 'deviceBrowser', 'device_operatingSystem'])
               .agg({'totals_hits':'sum', 'totals_pageviews':'sum', 'totals_bounces':'sum','totals_transactionRevenue':'sum'})
df_groupby = df_groupby.sort_values(by='totals_transactionRevenue', ascending=False).reset_index(drop=True)

Image for post

#machine-learning #k-means #segmentation #data-science #clustering

Elton  Bogan

Elton Bogan


SciPy Cluster - K-Means Clustering and Hierarchical Clustering

SciPy is the most efficient open-source library in python. The main purpose is to compute mathematical and scientific problems. There are many sub-packages in SciPy which further increases its functionality. This is a very important package for data interpretation. We can segregate clusters from the data set. We can perform clustering using a single or multi-cluster. Initially, we generate the data set. Then we perform clustering on the data set. Let us learn more SciPy Clusters.

K-means Clustering

It is a method that can employ to determine clusters and their center. We can use this process on the raw data set. We can define a cluster when the points inside the cluster have the minimum distance when we compare it to points outside the cluster. The k-means method operates in two steps, given an initial set of k-centers,

  • We define the cluster data points for the given cluster center. The points are such that they are closer to the cluster center than any other center.
  • We then calculate the mean for all the data points. The mean value then becomes the new cluster center.

The process iterates until the center value becomes constant. We then fix and assign the center value. The implementation of this process is very accurate using the SciPy library.

#numpy tutorials #clustering in scipy #k-means clustering in scipy #scipy clusters #numpy

Alec  Nikolaus

Alec Nikolaus


Introduction to k-Means Clustering

Cluster is a group of objects which have similar properties and belong to the same class.

What is Clustering?

Clustering is an unsupervised learning technique which is used to make clusters of objects i.e. it is a technique to group objects of similar kind in a group. In clustering, we first partition the set of data into groups based on the similarity and then assign the labels to those groups. Also, it helps us to find out various useful features that can help in distinguishing between different groups.

Types of Clustering

Most common categories of clustering are:-

  • Partitioning Method
  • Hierarchical Method
  • Density-based Method
  • Grid-based Method
  • Model-based Method

Partitioning Method

Partitioning method classifies the group of n objects into groups based on the features and similarity of data.

The general problem would be like that we will have ‘n’ objects and we need to construct ‘k’ partitions among the data objects where each partition represents a cluster and will contain at least one object. Also, there is an additional condition that says each object can belong to only one group.

The partitioning method starts by creating an initial random partitioning. Then it iterates to improve the partitioning by moving the objects from one partition to another.

k-Means clustering follows the partitioning approach to classify the data.

Hierarchical Method

The hierarchical method performs a hierarchical decomposition of the given set of data objects. It starts by considering every data point as a separate cluster and then iteratively identifies two clusters which can be closest together and then merge these two clusters into one. We continue this until all the clusters are merged together into a single big cluster. A diagram called **Dendrogram **is used torepresent this hierarchy.

There are two approaches depending on how we create the hierarchy −

  • Agglomerative Approach
  • Divisive Approach

Agglomerative Approach

Agglomerative approach is a type of hierarchical method which uses bottom-up strategy. We start with each object considering as a separate cluster and keeps on merging the objects that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds.

#k-means-clustering #machine-learning #clustering #python #code

Gerhard  Brink

Gerhard Brink


Understanding Core Data Science Algorithms: K-Means and K-Medoids Clustering

This article provides an overview of core data science algorithms used in statistical data analysis, specifically k-means and k-medoids clustering.

Clustering is one of the major techniques used for statistical data analysis.

As the term suggests, “clustering” is defined as the process of gathering similar objects into different groups or distribution of datasets into subsets with a defined distance measure.

K-means clustering is touted as a foundational algorithm every data scientist ought to have in their toolbox. The popularity of the algorithm in the data science industry is due to its extraordinary features:

  • Simplicity
  • Speed
  • Efficiency

#big data #big data analytics #k-means clustering #big data algorithms #k-means #data science algorithms