Statistics involves gathering data, analyzing it, and drawing conclusions based on the information collected.

NumPy provides us with various statistical functions that can perform statistical data analysis.

Here are some of the statistical functions provided by NumPy:

Functions | Descriptions |
---|---|

`median()` | return the median of an array |

`mean()` | return the mean of an array |

`std()` | return the standard deviation of an array |

`percentile()` | return the nth percentile of elements in an array |

`min()` | return the minimum element of an array |

`max()` | return the maximum element of an array |

Next, we will see examples using these functions.

The median value of a numpy array is the middle value in a **sorted array**.

In other words, it is the value that separates the higher half from the lower half of the data.

Suppose we have the following list of numbers:

`1, 5, 7, 8, 9, 12, 14`

Then, median is simply the middle number, which in this case is **8**.

It is important to note that if the number of elements is

**Odd**, the median is the middle element.**Even**, the median is the average of the two middle elements.

Now, we will learn how to calculate the median using NumPy for arrays with odd and even number of elements.

```
import numpy as np
# create a 1D array with 5 elements
array1 = np.array([1, 2, 3, 4, 5])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.0
```

In the above example, the array named array1 contains an odd number of elements (**5** elements).

So, `np.median(array1)`

returns the median of `array1`

as **3**, which is the middle value of the sorted array.

```
import numpy as np
# create a 1D array with 6 elements
array1 = np.array([1, 2, 3, 4, 5, 7])
# calculate the median
median = np.median(array1)
print(median)
# Output: 3.5
```

Here, since the array1 array has an even number of elements (**6** elements), the median is calculated as the average of the two middle elements (**3** and **4)** i.e. **3.5.**

Calculation of the median is not just limited to 1D array. We can also calculate the median of the 2D array.

In a 2D array, median can be calculated either along the horizontal or the vertical axis individually, or across the entire array.

When computing the median of a 2D array, we use the `axis`

parameter inside `np.median()`

to specify the axis along which to compute the median.

If we specify,

`axis = 0`

, median is calculated along vertical axis`axis = 1`

, median is calculated along horizontal axis

If we don't use the `axis`

parameter, the median is computed over the entire array.

```
import numpy as np
# create a 2D array
array1 = np.array([[2, 4, 6],
[8, 10, 12],
[14, 16, 18]])
# compute median along horizontal axis
result1 = np.median(array1, axis=1)
print("Median along horizontal axis :", result1)
# compute median along vertical axis
result2 = np.median(array1, axis=0)
print("Median along vertical axis:", result2)
# compute median of entire array
result3 = np.median(array1)
print("Median of entire array:", result3)
```

**Output**

```
Median along horizontal axis : [ 4. 10. 16.]
Median along vertical axis: [ 8. 10. 12.]
Median of entire array: 10.0
```

In this example, we have created a 2D array named array1.

We then computed the median along the horizontal and vertical axis individually and then computed the median of the entire array.

`np.median(array1, axis=1)`

- median along horizontal axis, which gives`[4. 10. 16.]`

`np.median(array1, axis=0)`

- median along vertical axis, which gives`[8. 10. 12.]`

`np.median(array1)`

- median over the entire array, which gives`10.0`

To calculate the median over the entire 2D array, first we flatten the array to `[ 2, 4, 6, 8, 10, 12, 14, 16, 18]`

and then find the middle value of the flattened array which in our case is **10**.

The mean value of a NumPy array is the average value of all the elements in the array.

It is calculated by adding all elements in the array and then dividing the result by the total number of elements in the array.

We use the `np.mean()`

function to calculate the mean value. For example,

```
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the mean of marks
mean_marks = np.mean(marks)
print(mean_marks)
# Output: 77.2
```

In this example, the mean value is **77.2**, which is calculated by adding the elements (**76, 78, 81, 66, 85**) and dividing the result by **5** (total number of array elements).

```
import numpy as np
# create a 2D array
array1 = np.array([[1, 3],
[5, 7]])
# calculate the mean of the entire array
result1 = np.mean(array1)
print("Entire Array:",result1) # 4.0
# calculate the mean along vertical axis (axis=0)
result2 = np.mean(array1, axis=0)
print("Along Vertical Axis:",result2) # [3. 5.]
# calculate the mean along (axis=1)
result3 = np.mean(array1, axis=1)
print("Along Horizontal Axis :",result3) # [2. 6.]
```

**Output**

```
Entire Array: 4.0
Along Vertical Axis: [3. 5.]
Along Horizontal Axis : [2. 6.]
```

Here, first we have created the 2D array named array1. We then calculated the mean using `np.mean()`

.

`np.mean(array1)`

- calculates the mean over the entire array`np.mean(array1, axis=0)`

- calculates the mean along vertical axis`np.mean(array1, axis=1)`

calculates the mean along horizontal axis

The standard deviation is a measure of the spread of the data in the array. It gives us the degree to which the data points in an array deviate from the mean.

- Smaller standard deviation indicates that the data points are closer to the mean
- Larger standard deviation indicates that the data points are more spread out.

In NumPy, we use the `np.std()`

function to calculate the standard deviation of an array.

```
import numpy as np
# create a numpy array
marks = np.array([76, 78, 81, 66, 85])
# compute the standard deviation of marks
std_marks = np.std(marks)
print(std_marks)
# Output: 6.803568381206575
```

In the above example, we have used the `np.std()`

function to calculate the standard deviation of the `marks`

array.

Here, `6.803568381206575`

is the standard deviation of `marks`

. It tells us how much the values in the `marks`

array deviate from the mean value of the array.

In a 2D array, standard deviation can be calculated either along the horizontal or the vertical axis individually, or across the entire array.

Similar to mean and median, when computing the standard deviation of a 2D array, we use the `axis`

parameter inside `np.std()`

to specify the axis along which to compute the standard deviation.

```
import numpy as np
# create a 2D array
array1 = np.array([[2, 5, 9],
[3, 8, 11],
[4, 6, 7]])
# compute standard deviation along horizontal axis
result1 = np.std(array1, axis=1)
print("Standard deviation along horizontal axis:", result1)
# compute standard deviation along vertical axis
result2 = np.std(array1, axis=0)
print("Standard deviation along vertical axis:", result2)
# compute standard deviation of entire array
result3 = np.std(array1)
print("Standard deviation of entire array:", result3)
```

**Output**

```
Standard deviation along horizontal axis: [2.86744176 3.29983165 1.24721913]
Standard deviation along vertical axis: [0.81649658 1.24721913 1.63299316]
Standard deviation of entire array: 2.7666443551086073
```

Here, we have created a 2D array named array1.

We then computed the standard deviation along horizontal and vertical axis individually and then computed the standard deviation of the entire array.

In NumPy, we use the `percentile()`

function to compute the nth percentile of a given array.

Let's see an example.

```
import numpy as np
# create an array
array1 = np.array([1, 3, 5, 7, 9, 11, 13, 15, 17, 19])
# compute the 25th percentile of the array
result1 = np.percentile(array1, 25)
print("25th percentile:",result1)
# compute the 75th percentile of the array
result2 = np.percentile(array1, 75)
print("75th percentile:",result2)
```

**Output**

```
25th percentile: 5.5
75th percentile: 14.5
```

Here,

**25%**of the values in array1 are less than or equal to**5.5**.**75%**of the values in array1 are less than or equal to**14.5**.

**Note**: To learn more about percentile, visit *NumPy Percentile*.

We use the `min()`

and `max()`

function in NumPy to find the minimum and maximum values in a given array.

Let's see an example.

```
import numpy as np
# create an array
array1 = np.array([2,6,9,15,17,22,65,1,62])
# find the minimum value of the array
min_val = np.min(array1)
# find the maximum value of the array
max_val = np.max(array1)
# print the results
print("Minimum value:", min_val)
print("Maximum value:", max_val)
```

**Output**

```
Minimum value: 1
Maximum value: 65
```

As we can see `min()`

and `max()`

returns the minimum and maximum value of array1 which is **1** and **65** respectively.

**Note**: To learn more about `min()`

and `max()`

, visit *NumPy min()* and *NumPy max()*.

- This blog post was originally published at:https://www.programiz.com/

1.10 GEEK