Data science and Big data are intimately correlated, the greats Data Science insights are only possible because of the huge amount of data.

When learning Python for Data Science the first two libraries we are introduced are Pandas and Numpy.

Why is it so important to use Numpy?

Numpy is better than the other methods because it uses C language in it’s background, C is a low-level language much more efficient and faster.

Examples

  • Create a square function (n*n); 4.5 s ± 28.5 ms
  • Create a square function (n2);** 7.2 s ± 72.1 ms
  • With list comprehension; 3.93 s ± 32.1 ms
  • With map(); 4.84 s ± 46.8 ms
  • With NumPy; 63.5 ms ± 392 µs
  • With NumPy and converting to a list. 1.68 s ± 16.1 ms

First of all, create a sample:

import numpy as np

#Create a sample with 20 mio. from 0 to 100
sample = np.random.randint(0, 100, 20000000)
  • Create a square function (n*n);
#Creating a square iterative function (n**2)
def square_f(numbers):
    square = []
    for num in numbers:
        square.append(num * num)
    return square
%timeit square_f(sample)

4.5 s ± 28.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • Create a square function (n2);**
#Creating a square iterative function (n**2)
def square_f2(numbers):
    square = []
    for num in numbers:
        square.append(num ** 2)
    return square
%timeit square_f2(sample)

7.2 s ± 72.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As seen above, _n*n is 1.6 times faster than n2_**.

#numpy #big-data #data-science #python #performance

Why Should you Always use Numpy
1.15 GEEK