In this article we will focus on noise, in particular label noise- the scenario when a sample can have exactly one label (or class), and a subset of samples in the dataset are mislabeled.
How to make estimates using geolocation data with R. In this article, you will understand what is geostatistics, and how to use kriging, an interpolation method, to make estimates using geolocation data.
I came across the terms bias, variance, underfitting and overfitting while doing a course. The terms seemed daunting and articles online didn’t help either. Although concepts related to them are complex, the terms themselves are pretty simple.
Humans Are a Data Problem and Elon is Trying to Solve Us. Here’s Why Elon Musk’s Neuralink Should Change Your View on Humanity
Data Lake -Comparing Performance of Known Big Data Formats. Performance Comparison of well known Big Data Formats — CSV, JSON, AVRO, PARQUET & ORC
Everything You Should Know About Gradient Descent. This is one of the most popular optimisation algorithms in Data Science. Do you know how it works?
In this article, we will go through the key concepts such as what are Elasticsearch nodes, indices, sharding, documents, routing, replication. This will help you gain a good knowledge of how Elasticsearch works.
In this article, we will go thru the installation steps for Elasticsearch and Kibana on Windows OS. However, steps will be similar to Linux, macOS, and other systems as well.
I have presented some important programming takeaways to know and keep in mind while performing Machine Learning practices to make your implementation faster and effective.
An introduction to signals. Generate signals for your next machine learning project. This article will explore what a signal is, how we can generate, and store signals in Numpy for machine learning.
Industry favorite NLP techniques, the biggest trends, challenges and use cases. We talked to thought leaders applying NLP in different industries about their favorite NLP techniques, the biggest trends, as well as opportunities and challenges of NLP in 2020.
Death By Bias. How Algorithms Systemize Discriminatory Practices. Empowering Data Literacy within the Black community
The Importance of a Proper Data Culture. The basis of AI, Machine Learning or any type of Analytics starts with a data-driven organization.
Bad data makes data scientists work harder, not smarter! Data is one of the most important key factors (besides all the technical depth around an ML solution) that dictate whether.
Don’t do these things unless you want a biased model in production, making inaccurate and, at times, costly predictions. Despite the abundance of top quality machine learning (ML) practitioners and technological advancements, there is no dearth of real-life ML failures.
ML Programming Hacks that every Data Engineer should know. A wider Cheatsheet for Data Scientist & Machine Learning practitioners out there.
What Is Data Visualization? Exploratory data analysis is an essential part of data science and machine learning pipeline.
Let’s dive into exploring the use case of Sankey Charts in this series of Advanced Visualisation Techniques for Data Science. In this article, I am going to discuss about an essential part of Data Science - Data Visualization.
Public speaking — how to win the crowd? Transitioning from attendee to a presenter looks like a mountain to climb on. But, is it really like that?
Do you have a Data Lake, or a Data Pool? In the cloud, there are two terms for data storage Data Pool and Data Lake. They are different and neither uses a physical space.