1629179401

In this video we will learn about saturation modeling and deviation statistics.

#statquest #statistics #machine-learning

1628319000

This video on Probability distribution, will explain the concept of Probability Density Function with Examples. Learn what the probability density function is and implement it yourself in python by following along with this Probability and Statistics tutorial. At the end of this video, you will be able to find the probability density function of any sample with ease

The topics covered in this video are :

00:00 What is Probability Distribution Function?

08:13 Steps to find Probability Distribution Function

#python #probability #statistics #developer

1625866380

Loops are quite an important part of learning how to code in Python, and this is particularly true when it comes to implementing calculations across a large array of numbers.

All too often, the temptation for statisticians and data scientists is to skip over the more mundane aspects of coding such as this — we assume that software engineers can simply reformat the code in the proper way.

However, there are many situations where the person writing the code needs to understand both the statistics underlying the model as well as how to iterate the model output through loops — these two processes simply cannot be developed independently.

Here is one example of how the use of **for** loops in Python can greatly enhance statistical analysis.

#python #loops-in-python #for-loop #python-list-comprehension #statistics

1625729040

Probability and Likelihood in statistics and math world can be mixed very often. In real life situations most of us (I guess) do not see differences in such mathematics domains. Me too, until I analyzed couples of examples I based this explanatory video on for you.

So:

0:27 #Probability: Area under the curve given by selected criteria with defined distribution of samples having constant mean and standard deviation

3:05 #Likelihood: Y-axis values for fixed data points with distributions that can be moved.

In this video I use the case with randomly selected human heights as an statistical example.

Statistics is easy and fun - enjoy learning!

#statistics

#datasciencetips

#probability

#statistics #probability

1625052209

India celebrated the 15th National Statistics Day on the birth anniversary of Prasanta Chandra Mahalanobis. He is best known for introducing ‘The Mahalanobis Distance,’ a statistical measure used in many software programs today, and for founding the Indian Statistical Institute (ISI).

Read more: https://analyticsindiamag.com/8-most-popular-statistics-institutes-in-india/

#nationalstatisticsday #statistics

1625051400

Collinearity is a very common problem in machine learning projects. It is the correlation between the features of a dataset and it can reduce the performance of our models because it increases variance and the number of dimensions. It becomes worst when you have to work with unsupervised models.

In order to solve this problem, I’ve created a Python library that removes the collinear features.

#python #data-science #artificial-intelligence #machine-learning #statistics #a python library to remove collinearity

1624687320

Long before I took any statistical class, I’ve heard that A/B testing is almost a must for data analyst interviews. So I Googled it and thought: Hmm…isn’t it just like the control and experiment studies we conducted in high school biology class? Simple as it may sound, it actually has a rigorous statistical process and involves various business concerns.

A/B testing is a way to compareof atwo versions_ single variable__ , typically by testingto variant A against variant B , and determininga subject’s response[2].__which of the two variants is more effective _

For a better understanding, I will use this example throughout this article to give a more concrete explanation:

*Suppose an online learning platform wants you to test whether they should change the web page’s button size to increase the number of users.*

** Single variable:**_ “Join Now ” button size on a webpage_

** Variant A:**_ 4 x 3 button size_

** Variant B:**_ 16 x 9 button size_

**_Subject’s response: _***click-through probability changes*

** Goal:**_ find out which option has a higher click-through probability_

Sounds like if we change the variables and responses to any other attributes, A/B testing can still apply, huh? Indeed, A/B testing has many **Use Cases**, including:

*UI changes, recommendations, ranking changes, implicit changes such as loading time, etc.*

However, there are also cases it is **Not So Useful**:

*Missing items*- For our online course website example: if there are any courses we didn’t offer, but the users are looking for, A/B testing cannot tell.
*New experiences*- Introducing new experiences such as VIP services can be troublesome because:

a) The baseline of comparison is not clear

b) The time needed for users to adapt to new experiments can be quite costly, as there might be some **psychological influences** on users:

** Change Aversion**: when faced with a new interface or changed functionality, users often experience anxiety and confusion, resulting in a short-term negative effect.

** Novelty Effect**: when new techs came out, users will often have increased interests so that the performance will improve initially, but it’s not because of any actual improvement.

#statistics #data-science #data-analytics #data-analysis

1624597948

In a series of weekly articles, I will cover some important statistics topics with a twist.

The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.

At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.

Articles published so far:

- Bernoulli and Binomial Random Variables with Python
- From Binomial to Geometric and Poisson Random Variables with Python
- Sampling Distribution with Python
- Confidence Intervals with Python

As usual, the code is available on my GitHub.

#python #statistics #data-science #machine-learning #confidence intervals with python #confidence intervals

1624422180

Probability theory is a sect of mathematics that focuses on the interpretations of the likelihood of certain outcomes. The mathematical branch uses axioms to formally measure probability in probable space. Probable space is a name given to sample space that is used to determine likelihood. Given the definition, it is easy to see why probability theory takes a strong hold in experiments and predicting the outcomes of those experiments. Using data and probable space, we can make informed decisions using data based on prior results. Just as well, we can determine analytical correlations that can provide valuable information about the world we live in.

Needless to say, this makes this branch of mathematics pretty powerful. Not only can we use these mathematics to make informed decisions about the world around us and how to improve it, but we can also use it to predict the impact of those decisions. This in turn makes us even more informed about the potential outcome of the decisions made from these analytics. Now that we understand the power of probability theory, let’s dive into probable space, distributions, and how we can use them in the Python programming language.

#mathematics #python #programming #statistics #a swift introduction to probability theory with python #a swift introduction to probability theory

1624298520

In a series of weekly articles, I will be covering some important topics of statistics with a twist.

The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.

At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.

Articles published so far:

- Bernoulli and Binomial Random Variables with Python
- From Binomial to Geometric and Poisson Random Variables with Python
- Sampling Distributions with Python

As usual, the code is available on my GitHub.

#statistics #distribution #python #machine-learning #sampling distributions with python #sampling distributions

1623911880

Over the past decade, powerful new tools have empowered organizations to gather and analyze data at much lower cost—enabling previously unimaginable power when it comes to predicting outcomes and behaviors. Given the availability of granular data and raw computing power accessible to us, data professionals are able to create powerful models that can predict patterns of consumption, travel, and behavior with ever-increasing accuracy. Ethical implications of this are many—far more complicated and nuanced than the scope of this article can cover. This article seeks to address the subject of honest engagement with data analysis.

Thanks to the nexus of availability and computability of data, institutions are turning to analysis and regression to solve a larger share of problems and answer a larger share of questions. This places responsibility on the shoulders of those who work with data to provide meaningful, but more importantly, accurate and unbiased analysis to stakeholders.

To meet this standard, a data professional needs to avoid using unrigorous statistical methods and inaccurately modeling, yes—but they also need to be statistically/intellectually honest. There are many avenues through which a data professional can avoid being dishonest—either intentionally or unintentionally—being aware of them can help avoid drawing fallacious conclusions.

#analytics #data #information-technology #statistics #big data comes big responsibility #big responsibility

1623744281

**Step 1: write a scheduled query in BigQuery**

First, you need to create a scheduled query in BigQuery that periodically gets the significance level of your test. I make the query run every day.The only thing to modify from the following code is how you get the metrics of your A/B test (I get them through SQL queries as I’m already logging them in BigQuery) and the name of your destination table.

**Step 2: add the data to DataStudio**

Create a new DataStudio dashboard (or go to an existing one) and choose your favorite chart to show the data.

**Step 3: get a coffee**

I prefer matcha latte, honestly.

The important thing is that, it’s done! You now have live updates of the A/B significance level in DataStudio. Of course, this is not enough to know when the test has ended.

#ab-testing #statistics #google-data-studio #testing

1623428520

Why do we need to learn statistics for machine learning?Statistics help us analyze the data and draw inferences from it, which in turn helps us understand the data. For example, with the help of statistics, we can understand whether our data is skewed or normally distributed or if the data contains outliers. It helps us to detect the mean/median/mode of our data and allows us to see the range within which most data points lie. So, in short, it helps in the EDA part of machine learning which requires lots of data cleaning and also helps in feature engineering.Statistics can be divided into two parts:**a)** **Descriptive Statistics: This allows us to analyze and summarize the data with the help of different plots/graphs and tables.Graphs:· Box plot· HistogramTabular representation:**· Central Tendency (informs about mean/ median/ mode)· Standard Deviation· Variance· Range of data

🎯 What is Population**Population:** Population represents a large volume of entity data points which we intend to analyze.**Ex**: If we want to find out the average height of all the people of a country, then the height of all the people in the country represents a population.

🎯 What is Sample**Sample:** It is a small collection of data points that are picked up from population data. A good sample can be a close representation of the population. A sample always contains fewer data points than that of a population.**Ex:** Suppose I have chosen 1000 people from a country and analyze their average height and then decide about the average height of all the people in the country.

🎯 Why is Sampling Required:The population contains a huge volume of data, and it is practically impossible to collect that amount of data. Also, even if it is possible, it will be time-consuming. Sampling makes the work easier, and it is less time-consuming and practically possible as, in sampling, we don’t choose the whole population. Rather we pick a decent number of elements from the population, which can potentially summarize the population.**Note: Sample should be a close representation of the population.**

🎯 How does sampling affect the analysis if not properly done or the right amount of elements are not chosen from the Population?As we saw, we cannot analyze the whole country’s data, so we chose a small group of people within the country, which can more or less represent the country’s overall population. But we need to be sure that the sample we have chosen is not biased and correctly representing the population; otherwise, the sample will produce an incorrect result. Sample size (number of data points within the sample) also plays a vital role in the overall sampling performance.

#data-analytics #data-science-interview #statistics #data-science

1623422100

In a series of weekly articles, I will be covering some important topics of statistics with a twist.

The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.

At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.

Articles published so far:

- Bernoulli and Binomial Random Variables with Python
- Geometric and Poisson Random Variables with Python

As usual, the code is available on my GitHub.

#math #machine-learning #python #statistics #geometric and poisson random variables with python #geometric and poisson

1623407580

Pandas is a useful python library that can be used for a variety of data tasks including statistical analysis, data imputation, data wrangling and much more. In this post, we will go over three useful custom functions that allow us to generate statistics from data.

Let’s get started!

For our purposes, we will be working with the_ Wines Reviews _data set which can be found here.

To start, let’s import the pandas and numpy packages:

```
import pandas as pd
import numpy as np
```

#programming #python #data-science #statistics #software-development #custom pandas statistics functions