Data science and Big data are intimately correlated, the greats Data Science insights are only possible because of the huge amount of data.
When learning Python for Data Science the first two libraries we are introduced are Pandas and Numpy.
Numpy is better than the other methods because it uses C language in it’s background, C is a low-level language much more efficient and faster.
First of all, create a sample:
import numpy as np
#Create a sample with 20 mio. from 0 to 100
sample = np.random.randint(0, 100, 20000000)
#Creating a square iterative function (n**2)
def square_f(numbers):
square = []
for num in numbers:
square.append(num * num)
return square
%timeit square_f(sample)
4.5 s ± 28.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Creating a square iterative function (n**2)
def square_f2(numbers):
square = []
for num in numbers:
square.append(num ** 2)
return square
%timeit square_f2(sample)
7.2 s ± 72.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As seen above, _n*n is 1.6 times faster than n2_**.
#numpy #big-data #data-science #python #performance