Statistics involves gathering data, analyzing it, and drawing conclusions based on the information collected.
NumPy provides us with various statistical functions that can perform statistical data analysis.
Here are some of the statistical functions provided by NumPy:
Functions | Descriptions |
---|---|
median() | return the median of an array |
mean() | return the mean of an array |
std() | return the standard deviation of an array |
percentile() | return the nth percentile of elements in an array |
min() | return the minimum element of an array |
max() | return the maximum element of an array |
Next, we will see examples using these functions.
The median value of a numpy array is the middle value in a sorted array.
In other words, it is the value that separates the higher half from the lower half of the data.
Suppose we have the following list of numbers:
1, 5, 7, 8, 9, 12, 14
Then, median is simply the middle number, which in this case is 8.
It is important to note that if the number of elements is
Now, we will learn how to calculate the median using NumPy for arrays with odd and even number of elements.
import numpy as np
# create a 1D array with 5 elements
array1 = np.array([1, 2, 3, 4, 5])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.0
In the above example, the array named array1 contains an odd number of elements (5 elements).
So, np.median(array1)
returns the median of array1
as 3, which is the middle value of the sorted array.
import numpy as np
# create a 1D array with 6 elements
array1 = np.array([1, 2, 3, 4, 5, 7])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.5
Here, since the array1 array has an even number of elements (6 elements), the median is calculated as the average of the two middle elements (3 and 4) i.e. 3.5.
Calculation of the median is not just limited to 1D array. We can also calculate the median of the 2D array.
In a 2D array, median can be calculated either along the horizontal or the vertical axis individually, or across the entire array.
When computing the median of a 2D array, we use the axis
parameter inside np.median()
to specify the axis along which to compute the median.
If we specify,
axis = 0
, median is calculated along vertical axisaxis = 1
, median is calculated along horizontal axisIf we don't use the axis
parameter, the median is computed over the entire array.
import numpy as np
# create a 2D array
array1 = np.array([[2, 4, 6],
[8, 10, 12],
[14, 16, 18]])
# compute median along horizontal axis
result1 = np.median(array1, axis=1)
print("Median along horizontal axis :", result1)
# compute median along vertical axis
result2 = np.median(array1, axis=0)
print("Median along vertical axis:", result2)
# compute median of entire array
result3 = np.median(array1)
print("Median of entire array:", result3)
Output
Median along horizontal axis : [ 4. 10. 16.]
Median along vertical axis: [ 8. 10. 12.]
Median of entire array: 10.0
In this example, we have created a 2D array named array1.
We then computed the median along the horizontal and vertical axis individually and then computed the median of the entire array.
np.median(array1, axis=1)
- median along horizontal axis, which gives [4. 10. 16.]
np.median(array1, axis=0)
- median along vertical axis, which gives [8. 10. 12.]
np.median(array1)
- median over the entire array, which gives 10.0
To calculate the median over the entire 2D array, first we flatten the array to [ 2, 4, 6, 8, 10, 12, 14, 16, 18]
and then find the middle value of the flattened array which in our case is 10.
The mean value of a NumPy array is the average value of all the elements in the array.
It is calculated by adding all elements in the array and then dividing the result by the total number of elements in the array.
We use the np.mean()
function to calculate the mean value. For example,
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the mean of marks
mean_marks = np.mean(marks)
print(mean_marks)
# Output: 77.2
In this example, the mean value is 77.2, which is calculated by adding the elements (76, 78, 81, 66, 85) and dividing the result by 5 (total number of array elements).
import numpy as np
# create a 2D array
array1 = np.array([[1, 3],
[5, 7]])
# calculate the mean of the entire array
result1 = np.mean(array1)
print("Entire Array:",result1) # 4.0
# calculate the mean along vertical axis (axis=0)
result2 = np.mean(array1, axis=0)
print("Along Vertical Axis:",result2) # [3. 5.]
# calculate the mean along (axis=1)
result3 = np.mean(array1, axis=1)
print("Along Horizontal Axis :",result3) # [2. 6.]
Output
Entire Array: 4.0
Along Vertical Axis: [3. 5.]
Along Horizontal Axis : [2. 6.]
Here, first we have created the 2D array named array1. We then calculated the mean using np.mean()
.
np.mean(array1)
- calculates the mean over the entire arraynp.mean(array1, axis=0)
- calculates the mean along vertical axisnp.mean(array1, axis=1)
calculates the mean along horizontal axisThe standard deviation is a measure of the spread of the data in the array. It gives us the degree to which the data points in an array deviate from the mean.
In NumPy, we use the np.std()
function to calculate the standard deviation of an array.
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the standard deviation of marks
std_marks = np.std(marks)
print(std_marks)
# Output: 6.803568381206575
In the above example, we have used the np.std()
function to calculate the standard deviation of the marks
array.
Here, 6.803568381206575
is the standard deviation of marks
. It tells us how much the values in the marks
array deviate from the mean value of the array.
In a 2D array, standard deviation can be calculated either along the horizontal or the vertical axis individually, or across the entire array.
Similar to mean and median, when computing the standard deviation of a 2D array, we use the axis
parameter inside np.std()
to specify the axis along which to compute the standard deviation.
import numpy as np
# create a 2D array
array1 = np.array([[2, 5, 9],
[3, 8, 11],
[4, 6, 7]])
# compute standard deviation along horizontal axis
result1 = np.std(array1, axis=1)
print("Standard deviation along horizontal axis:", result1)
# compute standard deviation along vertical axis
result2 = np.std(array1, axis=0)
print("Standard deviation along vertical axis:", result2)
# compute standard deviation of entire array
result3 = np.std(array1)
print("Standard deviation of entire array:", result3)
Output
Standard deviation along horizontal axis: [2.86744176 3.29983165 1.24721913]
Standard deviation along vertical axis: [0.81649658 1.24721913 1.63299316]
Standard deviation of entire array: 2.7666443551086073
Here, we have created a 2D array named array1.
We then computed the standard deviation along horizontal and vertical axis individually and then computed the standard deviation of the entire array.
In NumPy, we use the percentile()
function to compute the nth percentile of a given array.
Let's see an example.
import numpy as np
# create an array
array1 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
# compute the 25th percentile of the array
result1 = np.percentile(array1, 25)
print("25th percentile:",result1)
# compute the 75th percentile of the array
result2 = np.percentile(array1, 75)
print("75th percentile:",result2)
Output
25th percentile: 5.5
75th percentile: 14.5
Here,
Note: To learn more about percentile, visit NumPy Percentile.
We use the min()
and max()
function in NumPy to find the minimum and maximum values in a given array.
Let's see an example.
import numpy as np
# create an array
array1 = np.array([2,6,9,15,17,22,65,1,62])
# find the minimum value of the array
min_val = np.min(array1)
# find the maximum value of the array
max_val = np.max(array1)
# print the results
print("Minimum value:", min_val)
print("Maximum value:", max_val)
Output
Minimum value: 1
Maximum value: 65
As we can see min()
and max()
returns the minimum and maximum value of array1 which is 1 and 65 respectively.
Note: To learn more about min()
and max()
, visit NumPy min() and NumPy max().