10 Principles of Practical Statistical Reasoning

There are 2 core aspects to fruitful application of statistics (data science):

  1. Domain knowledge.
  2. Statistical methodology.

Due to the highly specific nature of this field, it is difficult for any book or article to convey both a detailed and accurate description of the interplay between the two. In general, one can read material of two types:

  1. Broad info on statistical methods with conclusions that generalise but are not specific.
  2. Detailed statistical methods with conclusions that are useful only in a specific domain.

After 3 years working on my own data science projects and 3.5 years manipulating data on the trading floor, there is an additional category of learnings. It is fundamentally just as useful as the above and I take them into every project/side hustle/consulting gig…

Practical Statistical Reasoning

I made that term up because I don’t really know what to call this category. However, it covers:

  • The nature and objective of applied statistics/data science.
  • Principles common to all applications
  • Practical steps/questions for better conclusions

If you have experience of the application of statistical methods, I encourage you to use your experience to illuminate and criticise the following principles. If you have never tried implementing a statistical model, have a go and then return. Don’t see the following as a list to memorise. You’ll get peak synthesis of information if you can relate to your own experience.

The following principles have helped me become more efficient with my analyses and clearer in my conclusions. I hope you can find value in them too.

#machine-learning #data-science #statistics #programming #data

What is GEEK

Buddha Community

10 Principles of Practical Statistical Reasoning

10 Principles of Practical Statistical Reasoning

There are 2 core aspects to fruitful application of statistics (data science):

  1. Domain knowledge.
  2. Statistical methodology.

Due to the highly specific nature of this field, it is difficult for any book or article to convey both a detailed and accurate description of the interplay between the two. In general, one can read material of two types:

  1. Broad info on statistical methods with conclusions that generalise but are not specific.
  2. Detailed statistical methods with conclusions that are useful only in a specific domain.

After 3 years working on my own data science projects and 3.5 years manipulating data on the trading floor, there is an additional category of learnings. It is fundamentally just as useful as the above and I take them into every project/side hustle/consulting gig…

Practical Statistical Reasoning

I made that term up because I don’t really know what to call this category. However, it covers:

  • The nature and objective of applied statistics/data science.
  • Principles common to all applications
  • Practical steps/questions for better conclusions

If you have experience of the application of statistical methods, I encourage you to use your experience to illuminate and criticise the following principles. If you have never tried implementing a statistical model, have a go and then return. Don’t see the following as a list to memorise. You’ll get peak synthesis of information if you can relate to your own experience.

The following principles have helped me become more efficient with my analyses and clearer in my conclusions. I hope you can find value in them too.

#machine-learning #data-science #statistics #programming #data

Factors That Can Contribute to the Faulty Statistical Inference

Hypothesis testing is a procedure where researchers make a precise statement based on their findings or data. Then, they collect evidence to falsify that precise statement or claim. This precise statement or claim is called the null hypothesis. If the evidence is strong to falsify the null hypothesis, we can reject the null hypothesis and adapt the alternative hypothesis. This is the basic idea of hypothesis testing.

Error Types in Statistical Testing

There are two distinct types of errors that can occur in formal hypothesis testing. They are:

Type I: Type I error occurs when the null hypothesis is true but the hypothesis testing results show the evidence to reject it. This is called a false positive.

Type II: Type II error occurs when the null hypothesis is not true but it is not rejected in hypothesis testing.

Most hypothesis testing procedure performs well controlling type I error (at 5%) in ideal conditions. That may give a false idea that there is only a 5% probability that the reported findings are wrong. But it’s not that simple. The probability can be much higher than 5%.

Normality of the Data

The normality of the data is an issue that can break down a statistical test. If the dataset is small, the normality of the data is very important for some statistical processes such as confidence interval or p-test. But if the data is large enough, normality does not have a significant impact.

Correlation

If the variables in the dataset are correlated with each other, that may result in poor statistical inference. Look at this picture below:

Image for post

In this graph, two variables seem to have a strong correlation. Or, if a series of data is observed as a sequence, that means values are correlated with its neighbors, and there may have some clustering or autocorrelation in the data. This kind of behavior in the dataset can adversely impact the statistical tests.

Correlation and Causation

This is especially important when interpreting the result of a statistical test. “Correlation does not mean causation”. Here is an example. Suppose, you have study data that shows, more people who do not have college education believe that women should get paid less than men in the workplace. You may have conducted a good hypothesis testing and prove that. But care must be taken on what conclusion is drawn from this. Probably, there is a correlation between college education and the belief that ‘women should get paid less’. But it is not fair to say that not having a college degree is the cause of such belief. This is a correlation but not a direct cause ad effect relationship.

A more clear example can be provided from medical data. Studies showed that people with fewer cavities are less likely to get heart disease. You may have enough data to statistically prove that but you actually cannot say that the dental cavity causes heart disease. There is no medical theory like that.

#statistical-analysis #statistics #statistical-inference #math #data analysis

Top 10 Statistics Concepts to know prior

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply.

STATISTICS — Is known as to be the top prerequisite for a Data Science job. I personally did understand the few concepts when reading about Linear Regression, but if someone randomly asked me about Standard Deviation, I would be confused for sure.

So in this article, I have tried to build up a friendly approach towards some frequently asked Statistics questions. I am sure this will be beneficial to many.

Common Terms:

  1. Mean
  2. Mode
  3. Median
  4. Variance
  5. Standard Deviation
  6. Z-score
  7. Correlation
  8. Normal Distribution
  9. Empirical Rule
  10. Sampling

Also lets keep in mind the python library **.describe() , **this will give a hands on practice prior to starting off our Understanding.

Image for post

Figure 1

I will be referring to this Figure in the further read.

Lets get started!

1. Mean

Also known as one of the Central tendencies, Mean is basically the average of all the data points present for a feature.

But what is Central Tendency?

Central Tendency is used to indicate where does the middle or center of the distribution of our data lies.

Question: Which of these measures are used to analyze the central tendency of data?

a) Mean and Normal Distribution.

b) Mean, Median and Mode.

c) Mode, Alpha & Range.

d) Standard Deviation, Range and Mean

e) Median, Range and Normal Distribution.

Solution (b): The mean, median, mode are the three statistical measures which help us to analyze the central tendency of data. We use these measures to find the central value of the data to summarize the entire data set.

Calculation:

Image for post

#statistical-analysis #data-science #machine-learning #statistics #interview #data analysis

Gerhard  Brink

Gerhard Brink

1624696643

Top 10 Big Data Statistics You Must Know in 2021

Analytics Insight Presents the Top 10 Big Data Statistics for You to Know in 2021.

The future is bright for companies that use Big Data and analytics in this cut-throat competitive market. People are generating more than 2.5 Qn bytes of real-time data due to globalization and digital transformation in the tech-driven era. IoT is also providing data through multiple smart devices, social media accounts, and search engines. The scope of Big Data is increasing at an increasing rate that leads to more job opportunities in the field of Data Science and other disruptive technology fields. Ample Big Data software tools are available to beginners as well as professionals for effective data management to generate interactive reports for meaningful in-depth business insights. Thus, reputed companies and start-ups have started adopting Big Data by investing millions of dollars. Let’s look at the top 10 Big Data statistics to predict the nearby future of this data-driven world.

Top 10 Big Data Statistics You Must Know in 2021

#big data #latest news #top list #big data statistics #top 10 big data statistics you must know in 2021

Mery tris

Mery tris

1624388400

10 COINS TO $10 MILLION! Top coins to GET RICH in April 2021. DO NOT MISS!!!

0:00 Intro
0:15 Patreon
0:43 Coin #10
2:03 Coin #9
3:33 Coin #8
5:20 Coin #7
6:14 Coin #6
7:49 Coin #5
9:19 Coin #4
11:22 Coin #3
12:19 Coin #2
14:51 Coin #1
16:17 Join The Patreon!

📺 The video in this post was made by K Crypto
The origin of the article: https://www.youtube.com/watch?v=u0Cm8KqjDU4
🔺 DISCLAIMER: The article is for information sharing. The content of this video is solely the opinions of the speaker who is not a licensed financial advisor or registered investment advisor. Not investment advice or legal advice.
Cryptocurrency trading is VERY risky. Make sure you understand these risks and that you are responsible for what you do with your money
🔥 If you’re a beginner. I believe the article below will be useful to you ☞ What You Should Know Before Investing in Cryptocurrency - For Beginner
⭐ ⭐ ⭐The project is of interest to the community. Join to Get free ‘GEEK coin’ (GEEKCASH coin)!
☞ **-----CLICK HERE-----**⭐ ⭐ ⭐
Thanks for visiting and watching! Please don’t forget to leave a like, comment and share!

#bitcoin #blockchain #10 coins to $10 million #top coins #rich #10 coins to $10 million! top coins to get rich in april 2021