An Introduction to Classification Using Mislabeled Data

In this article we will focus on noise, in particular label noise- the scenario when a sample can have exactly one label (or class), and a subset of samples in the dataset are mislabeled.

Geostatistics in practice using R

How to make estimates using geolocation data with R. In this article, you will understand what is geostatistics, and how to use kriging, an interpolation method, to make estimates using geolocation data.

Bias, Variance and How they are related to Underfitting, Overfitting

I came across the terms bias, variance, underfitting and overfitting while doing a course. The terms seemed daunting and articles online didn’t help either. Although concepts related to them are complex, the terms themselves are pretty simple.

Humans Are a Data Problem and Elon is Trying to Solve Us

Humans Are a Data Problem and Elon is Trying to Solve Us. Here’s Why Elon Musk’s Neuralink Should Change Your View on Humanity

Data Lake -Comparing Performance of Known Big Data Formats

Data Lake -Comparing Performance of Known Big Data Formats. Performance Comparison of well known Big Data Formats — CSV, JSON, AVRO, PARQUET & ORC

Everything You Should Know About Gradient Descent

Everything You Should Know About Gradient Descent. This is one of the most popular optimisation algorithms in Data Science. Do you know how it works?

The Beginners’ Guide to Elasticsearch 

In this article, we will go through the key concepts such as what are Elasticsearch nodes, indices, sharding, documents, routing, replication. This will help you gain a good knowledge of how Elasticsearch works.

The Beginners’ Guide to Elasticsearch

In this article, we will go thru the installation steps for Elasticsearch and Kibana on Windows OS. However, steps will be similar to Linux, macOS, and other systems as well.

ML Programming Hacks that every Data Engineer should know 

I have presented some important programming takeaways to know and keep in mind while performing Machine Learning practices to make your implementation faster and effective.

Understanding Signals. It’s not that complicated.

An introduction to signals. Generate signals for your next machine learning project. This article will explore what a signal is, how we can generate, and store signals in Numpy for machine learning.

NLP Trends and Use Cases in 2020

Industry favorite NLP techniques, the biggest trends, challenges and use cases. We talked to thought leaders applying NLP in different industries about their favorite NLP techniques, the biggest trends, as well as opportunities and challenges of NLP in 2020.

Death By Bias. How Algorithms Systemize Discriminatory Practices.

Death By Bias. How Algorithms Systemize Discriminatory Practices. Empowering Data Literacy within the Black community

The Importance of a Proper Data Culture

The Importance of a Proper Data Culture. The basis of AI, Machine Learning or any type of Analytics starts with a data-driven organization.

The cost of poor data quality

Bad data makes data scientists work harder, not smarter! Data is one of the most important key factors (besides all the technical depth around an ML solution) that dictate whether.

How to Incorporate Bias in Your Predictive Models

Don’t do these things unless you want a biased model in production, making inaccurate and, at times, costly predictions. Despite the abundance of top quality machine learning (ML) practitioners and technological advancements, there is no dearth of real-life ML failures.

ML Programming Hacks that every Data Engineer should know

ML Programming Hacks that every Data Engineer should know. A wider Cheatsheet for Data Scientist & Machine Learning practitioners out there.

What Is Data Visualization?

What Is Data Visualization? Exploratory data analysis is an essential part of data science and machine learning pipeline.

The tale of Ultra Modern Visualizations - Sankey chart

Let’s dive into exploring the use case of Sankey Charts in this series of Advanced Visualisation Techniques for Data Science. In this article, I am going to discuss about an essential part of Data Science - Data Visualization.

Public speaking — how to win the crowd?

Public speaking — how to win the crowd? Transitioning from attendee to a presenter looks like a mountain to climb on. But, is it really like that?

Do you have a Data Lake, or a Data Pool?

Do you have a Data Lake, or a Data Pool? In the cloud, there are two terms for data storage Data Pool and Data Lake. They are different and neither uses a physical space.