Explaining the working of the most common central methods like mean, median, mode and how it can help in dealing with our data.As we know to deal with our data has a number of steps like data extraction, data cleaning, handling missing data, exploratory data analysis, etc. and statistics play a very important role in many of these steps
Central Tendency is the measure of very basic but very useful statistical functions that represents a central point or typical value of the dataset. It help’s in indicating the point value where the most value in the distribution falls referring to the central location of the distribution. The most common central tendency methods used for the analysis of numerical data are mean, median, and mode.
The mean is the most common and well-known method for measuring central tendency and can be used to handle both discrete and continuous data. We can calculate mean as the sum of all the values in the dataset divided by the number of values in the dataset and is denoted as ‘µ’.
Mean is not often one of the actual values that you have observed in your data set but it is one of the most important properties as it minimizes the error to predict the value in any dataset. The reason behind having the lowest error is because it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero.
In the below image we can see the histogram for an array of values and then calculated the mean by summing all the values on the x-axis and just dividing by the number of values i.e 12.
However, the disadvantage of using the mean is that it is particularly susceptible to the influence of outliers. Outliners are the value that is very unusual as compared to the rest of the data, like making a particular value being very small or very large as compared to the rest. Focusing the case when our data is skewed or we can say that when the data is perfectly normal, the mean, median, and mode are identical. In this case, mean lose its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value.
The below histogram shows the image with the skewed dataset and hence all the three mean median and mode will be approx equal to each other.
Median is the middle value of your observation when the values in the dataset are ordered from the smallest to the largest. If the number of values in the dataset is an odd number then the middle value is the median. But if you have odd number values in the dataset then in order to find median we just take the average of the two middle values.
The below histogram shows the relationship between the mean and mode if we have symmetric data.
In this blog we will learn about Mean, Median, Mode. We will understand these three topics through this data.
Online Data Science Training in Noida at CETPA, best institute in India for Data Science Online Course and Certification. Call now at 9911417779 to avail 50% discount.
Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.
Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments. Our latest survey report suggests that as the overall Data Science and Analytics market evolves to adapt to the constantly changing economic and business environments, data scientists and AI practitioners should be aware of the skills and tools that the broader community is working on. A good grip in these skills will further help data science enthusiasts to get the best jobs that various industries in their data science functions are offering.
Mean / Median /Mode/ Variance /Standard Deviation are all very basic but very important concept of statistics used in data science.