Statistics is among the most widely used and important disciplines of study that has proved to be indispensable in numerous domains such as Engineering, Psychology, Operational Research, Chemometrics, etc. Among the most dependent statistical discipline is the field of Data Science and this is the reason that for having an in-depth understanding of it, Statistics should be understood in great detail.
The term statistics is often misunderstood, and this is the reason that first, we need to get a very clear understanding of it. In order to understand basic statistics for data science, we first have to understand get familiarized with a few basic terminologies.
A population can be understood as the total number of individual humans, other organisms, or any other object that makes up a whole. With this understanding, the underlying conditions are very important in determining the number of objects/items, etc. that will form the population. If we talk about the Apple Laptops manufactured in the month of September 2013 in a particular factory of China, then the number may not be as large as the total number of computers presently active in the world. Thus, the population may or may not be large as this depends on the conditions which define what is to be considered as the population.
Numerous mathematical calculations can be performed on the population such as finding the most common item or value occurring in the population or finding the average etc. All such arithmetic operations that allow us to define the population in simple numeric digits are provided with the term parameter. For example, if we want to know the average age of all the people living in a village. If there are 200 people in that village whose age we are able to capture successfully then this average age will be called a parameter. It will be called so as its value has been calculated using the complete population information.
In simplest terms, a sample is nothing but a subset of the population (that ideally represents the population). The samples can be of various types such as
One must keep in mind that no method is intrinsically better or worse than the other and are just different ways of creating a sample that suits different requirements.
The next logical question could be to question the very need of creating a sample in the first place. Why do we need to create a sample when we have the population and this has few obvious answers.
#uncategorized #data science