Data Science Tools Illustrated Study Guides

These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.

100 Essentials from Semantics and Pragmatics - KDnuggets

Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms.

A Deep Dive Into the Transformer Architecturesformer Models - KDnuggets

Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.

Microsoft’s DoWhy is a Cool Framework for Causal Inference

Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.

Data Versioning: Does it mean what you think it means?

Does data versioning mean what you think it means? Read this overview with use cases to see what data versioning really is, and the tools that can help you manage it.

Breaking Privacy in Federated Learning

Despite the benefits of federated learning, there are still ways of breaching a user’s privacy, even without sharing private data. In this article, we’ll review some research papers that discuss how federated learning includes this vulnerability.

Explainable and Reproducible Machine Learning Model Development

With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.Machine learning model development is hard, especially in the real world. Typically, you need to:

4 ways to improve your TensorFlow model

Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow.In mathematics, statistics, and computer science, particularly in machine learning and inverse problems, regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting.

Getting Started with Feature Selection

This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.

Introduction to Federated Learning

Introduction to Federated Learning. Federated learning means enabling on-device training, model personalization, and more. Read more about it in this article.

Performance Testing on Big Data Applications

You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.

Must-read NLP and Deep Learning articles for Data Scientists

Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.

Visualizing the Mobility Trends in European Countries Affected by COVID-19

This post highlights the movement of people from the 10 most-affected European countries based on the way they stay at home, work, and visit places, using Google's anonymized location tracking dataset.

How Do Neural Networks Learn?

With neural networks being so popular today in AI and machine learning development, they can still look like a black box in terms of how they learn to make predictions. To understand what is going on deep in these networks, we must consider how neural networks perform optimization. Neural networks are, without a doubt, the most popular machine learning technique that is used nowadays. So, I think it is worth understanding how they actually learn.

3D Human Pose Estimation Experiments and Analysis - KDnuggets

In this article, we explore how 3D human pose estimation works based on our research and experiments, which were part of the analysis of applying human pose estimation.

The List of Top 10 Lists in Data Science - KDnuggets

The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field.

Data Scientist Job Market 2020

With an analysis of over a thousand Data Scientist job descriptions in the USA, check out the trends for 2020 and current expectations on new positions in the field, including credentials.

Hypothesis Test for Real Problems

Hypothesis tests are significant for evaluating answers to questions concerning samples of data.

Containerization of PySpark Using Kubernetes

This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.Spark is a general-purpose distributed data processing engine designed for fast computation. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application. It supports workloads such as batch applications, iterative algorithms, interactive queries and streaming. During execution, it creates the following components:

5 Different Ways to Load Data in Python - KDnuggets

Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.