1629179401
In this video we will learn about saturation modeling and deviation statistics.
#statquest #statistics #machine-learning
1628319000
This video on Probability distribution, will explain the concept of Probability Density Function with Examples. Learn what the probability density function is and implement it yourself in python by following along with this Probability and Statistics tutorial. At the end of this video, you will be able to find the probability density function of any sample with ease
The topics covered in this video are :
00:00 What is Probability Distribution Function?
08:13 Steps to find Probability Distribution Function
#python #probability #statistics #developer
1625866380
Loops are quite an important part of learning how to code in Python, and this is particularly true when it comes to implementing calculations across a large array of numbers.
All too often, the temptation for statisticians and data scientists is to skip over the more mundane aspects of coding such as this — we assume that software engineers can simply reformat the code in the proper way.
However, there are many situations where the person writing the code needs to understand both the statistics underlying the model as well as how to iterate the model output through loops — these two processes simply cannot be developed independently.
Here is one example of how the use of for loops in Python can greatly enhance statistical analysis.
#python #loops-in-python #for-loop #python-list-comprehension #statistics
1625729040
Probability and Likelihood in statistics and math world can be mixed very often. In real life situations most of us (I guess) do not see differences in such mathematics domains. Me too, until I analyzed couples of examples I based this explanatory video on for you.
So:
0:27 #Probability: Area under the curve given by selected criteria with defined distribution of samples having constant mean and standard deviation
3:05 #Likelihood: Y-axis values for fixed data points with distributions that can be moved.
In this video I use the case with randomly selected human heights as an statistical example.
Statistics is easy and fun - enjoy learning!
#statistics
#datasciencetips
#probability
#statistics #probability
1625052209
India celebrated the 15th National Statistics Day on the birth anniversary of Prasanta Chandra Mahalanobis. He is best known for introducing ‘The Mahalanobis Distance,’ a statistical measure used in many software programs today, and for founding the Indian Statistical Institute (ISI).
Read more: https://analyticsindiamag.com/8-most-popular-statistics-institutes-in-india/
#nationalstatisticsday #statistics
1625051400
Collinearity is a very common problem in machine learning projects. It is the correlation between the features of a dataset and it can reduce the performance of our models because it increases variance and the number of dimensions. It becomes worst when you have to work with unsupervised models.
In order to solve this problem, I’ve created a Python library that removes the collinear features.
#python #data-science #artificial-intelligence #machine-learning #statistics #a python library to remove collinearity
1624687320
Long before I took any statistical class, I’ve heard that A/B testing is almost a must for data analyst interviews. So I Googled it and thought: Hmm…isn’t it just like the control and experiment studies we conducted in high school biology class? Simple as it may sound, it actually has a rigorous statistical process and involves various business concerns.
A/B testing is a way to compare two versions of a_ single variable__ , typically by testing a subject’s response to variant A against variant B , and determining _which of the two variants is more effective _[2]._
For a better understanding, I will use this example throughout this article to give a more concrete explanation:
Suppose an online learning platform wants you to test whether they should change the web page’s button size to increase the number of users.
Single variable:_ “Join Now ” button size on a webpage_
Variant A:_ 4 x 3 button size_
Variant B:_ 16 x 9 button size_
_Subject’s response: _click-through probability changes
Goal:_ find out which option has a higher click-through probability_
Sounds like if we change the variables and responses to any other attributes, A/B testing can still apply, huh? Indeed, A/B testing has many Use Cases, including:
UI changes, recommendations, ranking changes, implicit changes such as loading time, etc.
However, there are also cases it is Not So Useful:
a) The baseline of comparison is not clear
b) The time needed for users to adapt to new experiments can be quite costly, as there might be some psychological influences on users:
Change Aversion: when faced with a new interface or changed functionality, users often experience anxiety and confusion, resulting in a short-term negative effect.
Novelty Effect: when new techs came out, users will often have increased interests so that the performance will improve initially, but it’s not because of any actual improvement.
#statistics #data-science #data-analytics #data-analysis
1624597948
In a series of weekly articles, I will cover some important statistics topics with a twist.
The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.
At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.
Articles published so far:
As usual, the code is available on my GitHub.
#python #statistics #data-science #machine-learning #confidence intervals with python #confidence intervals
1624422180
Probability theory is a sect of mathematics that focuses on the interpretations of the likelihood of certain outcomes. The mathematical branch uses axioms to formally measure probability in probable space. Probable space is a name given to sample space that is used to determine likelihood. Given the definition, it is easy to see why probability theory takes a strong hold in experiments and predicting the outcomes of those experiments. Using data and probable space, we can make informed decisions using data based on prior results. Just as well, we can determine analytical correlations that can provide valuable information about the world we live in.
Needless to say, this makes this branch of mathematics pretty powerful. Not only can we use these mathematics to make informed decisions about the world around us and how to improve it, but we can also use it to predict the impact of those decisions. This in turn makes us even more informed about the potential outcome of the decisions made from these analytics. Now that we understand the power of probability theory, let’s dive into probable space, distributions, and how we can use them in the Python programming language.
#mathematics #python #programming #statistics #a swift introduction to probability theory with python #a swift introduction to probability theory
1624298520
In a series of weekly articles, I will be covering some important topics of statistics with a twist.
The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.
At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.
Articles published so far:
As usual, the code is available on my GitHub.
#statistics #distribution #python #machine-learning #sampling distributions with python #sampling distributions
1623911880
Over the past decade, powerful new tools have empowered organizations to gather and analyze data at much lower cost—enabling previously unimaginable power when it comes to predicting outcomes and behaviors. Given the availability of granular data and raw computing power accessible to us, data professionals are able to create powerful models that can predict patterns of consumption, travel, and behavior with ever-increasing accuracy. Ethical implications of this are many—far more complicated and nuanced than the scope of this article can cover. This article seeks to address the subject of honest engagement with data analysis.
Thanks to the nexus of availability and computability of data, institutions are turning to analysis and regression to solve a larger share of problems and answer a larger share of questions. This places responsibility on the shoulders of those who work with data to provide meaningful, but more importantly, accurate and unbiased analysis to stakeholders.
To meet this standard, a data professional needs to avoid using unrigorous statistical methods and inaccurately modeling, yes—but they also need to be statistically/intellectually honest. There are many avenues through which a data professional can avoid being dishonest—either intentionally or unintentionally—being aware of them can help avoid drawing fallacious conclusions.
#analytics #data #information-technology #statistics #big data comes big responsibility #big responsibility
1623744281
Step 1: write a scheduled query in BigQuery
First, you need to create a scheduled query in BigQuery that periodically gets the significance level of your test. I make the query run every day.The only thing to modify from the following code is how you get the metrics of your A/B test (I get them through SQL queries as I’m already logging them in BigQuery) and the name of your destination table.
Step 2: add the data to DataStudio
Create a new DataStudio dashboard (or go to an existing one) and choose your favorite chart to show the data.
Step 3: get a coffee
I prefer matcha latte, honestly.
The important thing is that, it’s done! You now have live updates of the A/B significance level in DataStudio. Of course, this is not enough to know when the test has ended.
#ab-testing #statistics #google-data-studio #testing
1623428520
Why do we need to learn statistics for machine learning?Statistics help us analyze the data and draw inferences from it, which in turn helps us understand the data. For example, with the help of statistics, we can understand whether our data is skewed or normally distributed or if the data contains outliers. It helps us to detect the mean/median/mode of our data and allows us to see the range within which most data points lie. So, in short, it helps in the EDA part of machine learning which requires lots of data cleaning and also helps in feature engineering.Statistics can be divided into two parts:a) Descriptive Statistics: This allows us to analyze and summarize the data with the help of different plots/graphs and tables.Graphs:· Box plot· HistogramTabular representation:· Central Tendency (informs about mean/ median/ mode)· Standard Deviation· Variance· Range of datab) **Inferential Statistics: **inferential statistics help us to infer a conclusion from the sample data about the population after performing descriptive statistical analysis on the sample data.It helps us identify if the sample correctly represents the whole population or not and how confident we are to claim so, with the help of **confidence interval.**Also, it is beneficial in choosing among multiple samples from the same population as to which one of them is more accurately describing the population.We have multiple hypothesis testing method which helps us to draw such kind of conclusions about a population from sample data and those are:· Null and Alternate hypothesis.· Z-test· T-test· Chi-square test· ANOVA and ANCOVA test
🎯 What is PopulationPopulation: Population represents a large volume of entity data points which we intend to analyze.Ex: If we want to find out the average height of all the people of a country, then the height of all the people in the country represents a population.
🎯 What is SampleSample: It is a small collection of data points that are picked up from population data. A good sample can be a close representation of the population. A sample always contains fewer data points than that of a population.Ex: Suppose I have chosen 1000 people from a country and analyze their average height and then decide about the average height of all the people in the country.
🎯 Why is Sampling Required:The population contains a huge volume of data, and it is practically impossible to collect that amount of data. Also, even if it is possible, it will be time-consuming. Sampling makes the work easier, and it is less time-consuming and practically possible as, in sampling, we don’t choose the whole population. Rather we pick a decent number of elements from the population, which can potentially summarize the population.Note: Sample should be a close representation of the population.
🎯 How does sampling affect the analysis if not properly done or the right amount of elements are not chosen from the Population?As we saw, we cannot analyze the whole country’s data, so we chose a small group of people within the country, which can more or less represent the country’s overall population. But we need to be sure that the sample we have chosen is not biased and correctly representing the population; otherwise, the sample will produce an incorrect result. Sample size (number of data points within the sample) also plays a vital role in the overall sampling performance.
#data-analytics #data-science-interview #statistics #data-science
1623422100
In a series of weekly articles, I will be covering some important topics of statistics with a twist.
The goal is to use Python to help us get intuition on complex concepts, empirically test theoretical proofs, or build algorithms from scratch. In this series, you will find articles covering topics such as random variables, sampling distributions, confidence intervals, significance tests, and more.
At the end of each article, you can find exercises to test your knowledge. The solutions will be shared in the article of the following week.
Articles published so far:
As usual, the code is available on my GitHub.
#math #machine-learning #python #statistics #geometric and poisson random variables with python #geometric and poisson
1623407580
Pandas is a useful python library that can be used for a variety of data tasks including statistical analysis, data imputation, data wrangling and much more. In this post, we will go over three useful custom functions that allow us to generate statistics from data.
Let’s get started!
For our purposes, we will be working with the_ Wines Reviews _data set which can be found here.
To start, let’s import the pandas and numpy packages:
import pandas as pd
import numpy as np
#programming #python #data-science #statistics #software-development #custom pandas statistics functions