These data science tools illustrated guides are broken up into four distinct categories: data retrieval, data manipulation, data visualization, and engineering tips. Both online and PDF versions of these guides are available.
Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms.
Even though transformers for NLP were introduced only a few years ago, they have delivered major impacts to a variety of fields from reinforcement learning to chemistry. Now is the time to better understand the inner workings of transformer architectures to give you the intuition you need to effectively work with these powerful tools.
Inspired by Judea Pearl’s do-calculus for causal inference, the open source framework provides a programmatic interface for popular causal inference methods.
Does data versioning mean what you think it means? Read this overview with use cases to see what data versioning really is, and the tools that can help you manage it.
Despite the benefits of federated learning, there are still ways of breaching a user’s privacy, even without sharing private data. In this article, we’ll review some research papers that discuss how federated learning includes this vulnerability.
With ML models serving real people, misclassified cases (which are a natural consequence of using ML) are affecting peoples’ lives and sometimes treating them very unfairly. It makes the ability to explain your models’ predictions a requirement rather than just a nice to have.Machine learning model development is hard, especially in the real world. Typically, you need to:
Regularization techniques are crucial for preventing your models from overfitting and enables them perform better on your validation and test sets. This guide provides a thorough overview with code of four key approaches you can use for regularization in TensorFlow.In mathematics, statistics, and computer science, particularly in machine learning and inverse problems, regularization is the process of adding information in order to solve an ill-posed problem or to prevent overfitting.
This beginners' guide with code examples for selecting the most useful features from your data will jump start you toward developing the most effective and efficient learning models.
Introduction to Federated Learning. Federated learning means enabling on-device training, model personalization, and more. Read more about it in this article.
You can use performance testing in any application you’re working on but it’s especially useful for big data applications. Let’s see why.
Check out these recent must-read guides, feature articles, and other resources to keep you on top of the latest advancements and ahead of the curve.
This post highlights the movement of people from the 10 most-affected European countries based on the way they stay at home, work, and visit places, using Google's anonymized location tracking dataset.
With neural networks being so popular today in AI and machine learning development, they can still look like a black box in terms of how they learn to make predictions. To understand what is going on deep in these networks, we must consider how neural networks perform optimization. Neural networks are, without a doubt, the most popular machine learning technique that is used nowadays. So, I think it is worth understanding how they actually learn.
In this article, we explore how 3D human pose estimation works based on our research and experiments, which were part of the analysis of applying human pose estimation.
The list of Top 10 lists that Data Scientists -- from enthusiasts to those who want to jump start a career -- must know to smoothly navigate a path through this field.
With an analysis of over a thousand Data Scientist job descriptions in the USA, check out the trends for 2020 and current expectations on new positions in the field, including credentials.
Hypothesis tests are significant for evaluating answers to questions concerning samples of data.
This article demonstrates the approach of how to use Spark on Kubernetes. It also includes a brief comparison between various cluster managers available for Spark.Spark is a general-purpose distributed data processing engine designed for fast computation. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application. It supports workloads such as batch applications, iterative algorithms, interactive queries and streaming. During execution, it creates the following components:
Data is the bread and butter of a Data Scientist, so knowing many approaches to loading data for analysis is crucial. Here, five Python techniques to bring in your data are reviewed with code examples for you to follow.