Dimensionality Reduction Techniques in Machine Learning. What are Dimensionality Reduction Techniques? Basically, dimension reduction refers to the process of converting a set of data.
In part II, we will go through a case study to demonstrate how to use surrogate models in practice. The roadmap for this case study is shown below
Regression Analysis is about predicting a value or attribute of a variable based on some other variables. And linear regression is when there is only one variable you want to predict based on another single variable.
Example of Chi-Square Test in Python. We will provide a practical example of how we can run a Chi-Square Test in Python.
This post will serve as a tutorial for querying data with SQL (Structured Query Language). For the purposes of this tutorial, I will be using the SQLite3 library, which provides a relational database management system. For examples, I will be using the Chinook Database, a sample database that represents a digital media store, including tables for artists, albums, etc. SQL in Data Science
Intro to Pandas and Three Easy Ways to Select Data. Exploring data with Python can be confusing, here are three simple techniques for selecting data in Pandas that makes it easy
In this article, we will demonstrate their relationships with the sample size by graphs. Specifically, we will discuss different scenarios with one-tail hypothesis testing.
This article will introduce the seven most important statistical distributions, show their Python simulations with either the Numpy library embedded functions or with a random variable generator, discuss the relationships among different distributions and their applications in data science.
Statistical concepts with examples, formula, and python code. The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns.
A New Way to BOW Analysis & Feature Engineering — Part1. Compare the frequency distributions across labels without building an ML model.
Here’s some super useful libraries so you can get the most out of R for analytics. Here’s five libraries which will make your analytics experience a lot more enjoyable and are essential in mastering or reaching the next step of your data science career.
Statistical test for MCAR in python… In this article, I will walk you through a set of codes, which analyses on missing values in categorical data.
We will look at some Python code for implementation. I only present the primary principle; the full version will be provided at the end of this article.
In this series of blogs I will try to deconstruct ideas in mathematical statistics as I have come to understand them intuitively. This is not a tutorial on Regression analysis but an attempt to bring concepts in Statistics out from the sophistication of mathematical language, within the reach of Data Scientists.
As Covariance has limitation to quantify the relationship, there is another concept called Pearson correlation coefficient (PCC) that overcome this limitation.
Aligning research design and statistical analyses. From the first day I sat in my undergraduate “Research Methods” course staring at SPSS output, I knew I found my calling.
What is data science for? I am always astounded by one frequently asked question: what is data science for?
— 5 widely used Evaluation Metrics for Regression Model in Machine Learning. Today we are going to discuss about Performance Metrics, and this time it will be Regression metrics.
The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results.
Analyzing the test results using Python. Intheory, there is no difference between theory and practice. My previous blog gives a basic idea of what exactly is A/B testing.