Data Transformation — Normalisation and Standardisation using Python Scikit-Learn.Usually, when I tell you a student has got 90 marks, you would think this is a very good student.
Usually, when I tell you a student has got 90 marks, you would think this is a very good student. Instead, if I say the marks are 75, that probably means the student might be average. However, as a Data Scientist/Analyst, we need at least ask two questions immediately:
The first question is obviously important and perhaps everyone would ask because 90/150 is definitely not better than 75/100. The second question is a little bit subtle and possibly only a “data person” will have this sensitivity.
In fact, in order to make sure an exam having its results normally distributed in the class, it is quite common to select exam questions as follows:
What if we have 100% easy questions or 100% difficult questions? If so, we’re very likely to have results that are not normally distributed in a class.
Then, we have our main topic now. I have been a tutor at a University for 5 years. It sometimes cannot be guaranteed that the exam questions are precisely followed the above proportions. To make sure it is fair to all the students, in other words, not too many students failed or too many students got A grades, sometimes we need to normalise the marks to make sure it follows the normal distribution.
python statistics data-transformation data-analysis data-science data analysis
🔵 Intellipaat Data Science with Python course: https://intellipaat.com/python-for-data-science-training/In this Data Science With Python Training video, you...
🔥Intellipaat Python for Data Science Course: https://intellipaat.com/python-for-data-science-training/In this python for data science video you will learn e...
Master Applied Data Science with Python and get noticed by the top Hiring Companies with IgmGuru's Data Science with Python Certification Program. Enroll Now
Python for Data Science, you will be working on an end-to-end case study to understand different stages in the data science life cycle. This will mostly deal with "data manipulation" with pandas and "data visualization" with seaborn. After this, an ML model will be built on the dataset to get predictions. You will learn about the basics of the sci-kit-learn library to implement the machine learning algorithm.
Exploratory data analysis is one of the best practices used in data science today. While starting a career in Data Science, people generally don’t know the difference between Data analysis and exploratory data analysis. There is not a very big difference between the two, but both have different purposes. Exploratory Data Analysis (EDA) From Scratch in Python