Dimensionality Reduction Techniques in Machine Learning

Dimensionality Reduction Techniques in Machine Learning. What are Dimensionality Reduction Techniques? Basically, dimension reduction refers to the process of converting a set of data.

An introduction to Surrogate modeling, Part II: case study

In part II, we will go through a case study to demonstrate how to use surrogate models in practice. The roadmap for this case study is shown below

The Linear Regression Equation in a Nutshell

Regression Analysis is about predicting a value or attribute of a variable based on some other variables. And linear regression is when there is only one variable you want to predict based on another single variable.

How to Run the Chi-Square Test in Python

Example of Chi-Square Test in Python. We will provide a practical example of how we can run a Chi-Square Test in Python.

SQL in Data Science

This post will serve as a tutorial for querying data with SQL (Structured Query Language). For the purposes of this tutorial, I will be using the SQLite3 library, which provides a relational database management system. For examples, I will be using the Chinook Database, a sample database that represents a digital media store, including tables for artists, albums, etc. SQL in Data Science

Intro to Pandas and Three Easy Ways to Select Data

Intro to Pandas and Three Easy Ways to Select Data. Exploring data with Python can be confusing, here are three simple techniques for selecting data in Pandas that makes it easy

How is Sample Size Related to Standard Error, Power, Confidence Level, and Effect Size?

In this article, we will demonstrate their relationships with the sample size by graphs. Specifically, we will discuss different scenarios with one-tail hypothesis testing.

Seven Must-Know Statistical Distributions and Their Simulations for Data Science

This article will introduce the seven most important statistical distributions, show their Python simulations with either the Numpy library embedded functions or with a random variable generator, discuss the relationships among different distributions and their applications in data science.

Descriptive Statistics with Pandas

Statistical concepts with examples, formula, and python code. The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns.

A new way to BOW Analysis & Feature Engineering — Part1

A New Way to BOW Analysis & Feature Engineering — Part1. Compare the frequency distributions across labels without building an ML model.

5 essential R libraries for Data Science

Here’s some super useful libraries so you can get the most out of R for analytics. Here’s five libraries which will make your analytics experience a lot more enjoyable and are essential in mastering or reaching the next step of your data science career.

Statistical test for MCAR in python…

Statistical test for MCAR in python… In this article, I will walk you through a set of codes, which analyses on missing values in categorical data.

Data Visualization: How to choose the right chart

We will look at some Python code for implementation. I only present the primary principle; the full version will be provided at the end of this article.

Demystifying Model Variance in Linear Regression-1

In this series of blogs I will try to deconstruct ideas in mathematical statistics as I have come to understand them intuitively. This is not a tutorial on Regression analysis but an attempt to bring concepts in Statistics out from the sophistication of mathematical language, within the reach of Data Scientists.

Pearson correlation coefficient

As Covariance has limitation to quantify the relationship, there is another concept called Pearson correlation coefficient (PCC) that overcome this limitation.

Research Design + Statistics Tests

Aligning research design and statistical analyses. From the first day I sat in my undergraduate “Research Methods” course staring at SPSS output, I knew I found my calling.

What is data science for?

What is data science for? I am always astounded by one frequently asked question: what is data science for?

Performance Metrics: Regression Model

— 5 widely used Evaluation Metrics for Regression Model in Machine Learning. Today we are going to discuss about Performance Metrics, and this time it will be Regression metrics.

Top 10 Statistics Concepts to know prior

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results.

A/B Testing: The Case Study!

Analyzing the test results using Python. Intheory, there is no difference between theory and practice. My previous blog gives a basic idea of what exactly is A/B testing.