While dealing with a large data, how many samples do we need to look at before we can have justified confidence in our answer? This depends on the variance of the dataset.
Variance tells us about the divergence and the inconsistency of the sample. So in this python article, we are going to build a function.
Mathematically we define it as:
So the following function can be used while working on a program with big data which is very useful and help you a lot.
So here is the code:
def variance(X):
mean = sum(X)/len(X)
tot = 0.0
for x in X:
tot = tot + (x - mean)**2
return tot/len(X)
# main code
# a simple data-set
sample = [1, 2, 3, 4, 5]
print("variance of the sample is: ", variance(sample))
sample = [1, 2, 3, -4, -5]
print("variance of the sample is: ", variance(sample))
sample = [10, -20, 30, -40, 50]
print("variance of the sample is: ", variance(sample))
Output
ariance of the sample is: 2.0
variance of the sample is: 10.64
variance of the sample is: 1064.0
import numpy as np
dataset= [21, 11, 19, 18, 29, 46, 20]
variance= np.var(dataset)
print(variance)
Output
108.81632653061224
Note:- Python variance() is an inbuilt function that is used to calculate the variance from the sample of data (sample is a subset of populated data). Python statistics module provides potent tools, which can be used to compute anything related to Statistics.
#define a function, to calculate variance
def variance(X):
mean = sum(X)/len(X)
tot = 0.0
for x in X:
tot = tot + (x - mean)**2
return tot/len(X)
# call the function with data set
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print("variance is: ", variance(x))
y = [1, 2, 3, -4, -5, -6, -7, -8]
print("variance is: ", variance(y))
z = [10, -20, 30, -40, 50, -60, 70, -80]
print("variance is: ", variance(z))
Output
variance is: 6.666666666666667
variance is: 16.5
variance is: 2525.0
#python #web-development