What is Central Tendency?

Central Tendency is the measure of very basic but very useful statistical functions that represents a central point or typical value of the dataset. It help’s in indicating the point value where the most value in the distribution falls referring to the central location of the distribution. The most common central tendency methods used for the analysis of numerical data are mean, median, and mode.

Mean

The mean is the most common and well-known method for measuring central tendency and can be used to handle both discrete and continuous data. We can calculate mean as the sum of all the values in the dataset divided by the number of values in the dataset and is denoted as ‘µ’.

Mean is not often one of the actual values that you have observed in your data set but it is one of the most important properties as it minimizes the error to predict the value in any dataset. The reason behind having the lowest error is because it includes every value in your data set as part of the calculation. In addition, the mean is the only measure of central tendency where the sum of the deviations of each value from the mean is always zero.

In the below image we can see the histogram for an array of values and then calculated the mean by summing all the values on the x-axis and just dividing by the number of values i.e 12.

However, the disadvantage of using the mean is that it is particularly susceptible to the influence of outliers. Outliners are the value that is very unusual as compared to the rest of the data, like making a particular value being very small or very large as compared to the rest. Focusing the case when our data is skewed or we can say that when the data is perfectly normal, the mean, median, and mode are identical. In this case, mean lose its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value.

The below histogram shows the image with the skewed dataset and hence all the three mean median and mode will be approx equal to each other.

Median

Median is the middle value of your observation when the values in the dataset are ordered from the smallest to the largest. If the number of values in the dataset is an odd number then the middle value is the median. But if you have odd number values in the dataset then in order to find median we just take the average of the two middle values.

The below histogram shows the relationship between the mean and mode if we have symmetric data.

#statistics #data-analysis #mean-median-mode #data-science #central-tendency

Statistics for Data Science Part 1: Use of Central Tendency for Data Analysis.
1.60 GEEK