Rahul  Gandhi

Rahul Gandhi


NumPy Optimization: Vectorization and Broadcasting

Libraries that speed up linear algebra calculations are a staple if you work in fields like machine learning, data science or deep learning. NumPy, short for Numerical Python, is perhaps the most famous of the lot, and chances are you’ve already used it. However, merely using NumPy arrays in place of vanilla Python lists hardly does justice to the capabilities that NumPy has to offer.

In this series I will cover best practices on how to speed up your code using NumPy, how to make use of features like vectorization and broadcasting, when to ditch specialized features in favor of vanilla Python offerings, and a case study where we will use NumPy to write a fast implementation of the K-Means clustering algorithm.

As far as this part is concerned, I will be covering:

  1. How to properly time your code to compare vanilla Python to optimized NumPy code.
  2. Why are loops slow in Python?
  3. What vectorization is, and how to vectorize your code.
  4. What broadcasting is, with examples demonstrating its applications.

NOTE: While this tutorial covers NumPy, a lot of these techniques can be extended to some of the other linear algebra libraries like PyTorch and TensorFlow as well. I’d also like to point out that this post is in no way an introduction to NumPy, and assumes basic familiarity with the library.

Timing your code

In order to really appreciate the speed boosts NumPy provides, we must come up with a way to measure the running time of a piece of code.

We can use Python’s time module for this.

import time 

tic = time.time()

## code goes here

toc = time.time()

print("Time Elapsed: ", toc - tic)

The problem with this method is that measuring a piece of code only once does not give us a robust estimate of its running time. The code may run slower or faster for a particular iteration due to various processes in the background, for instance. It is therefore prudent to compute the average running time over many runs to get a robust estimate. To accomplish this, we use Python’s timeit module.

import timeit 

setup = '''
import numpy as np 

snippet = 'arr = np.arange(100)'

num_runs = 10000

time_elapsed = timeit.timeit(setup = setup, stmt = snippet, number = num_runs)

print("Time Elapsed: ", time_elapsed / num_runs)
## Output -> Time Elapsed:  5.496922000020277e-07

The timeit.timeit method has three arguments:

  1. setup is a string that contains the necessary imports to run our snippet.
  2. stmt is the string describing our code snippet.
  3. number is the number of runs over which the experiment has to be run.

timeit can also be used to measure the run times of functions too, but only functions which don’t take any arguments. For this, we can pass the function name (not the function call) to the timeit.timeit method.

import timeit 

setup = '''
import numpy as np 

def fn():
    return np.arange(100)

num_runs = 10000

time_elapsed = timeit.timeit(setup = setup, stmt = fn, number = num_runs)

print("Time Elapsed: ", time_elapsed / num_runs)

If you are using an iPython console or Jupyter Notebook, you can use the %timeit magic command. The output is much more detailed than for the normal timeit.timeit call.

%timeit arr = np.arange(100)

## output -> 472 ns ± 7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

A word about loops

Whenever one is looking for bottlenecks in code, especially python code, loops are a usual suspect. Compared to languages like C/C++ , Python loops are relatively slower. While there are quite a few reasons why that is the case, I want to focus on one particular reason: the dynamically typed nature of Python.

Python first goes line-by-line through the code, compiles the code into bytecode, which is then executed to run the program. Let’s say the code contains a section where we loop over a list. Python is dynamically typed, which means it has no idea what type of objects are present in the list (whether it’s an integer, a string or a float). In fact, this information is basically stored in every object itself, and Python can not know this in advance before actually going through the list. Therefore, at each iteration python has to perform a bunch of checks every iteration like determining the type of variable, resolving it’s scope, checking for any invalid operations etc.

Contrast this with C, where arrays are allowed to be consisting of only one data type, which the compiler knows well ahead of time. This opens up possibility of many optimizations which are not possible in Python. For this reason, we see loops in python are often much slower than in C, and nested loops is where things can really get slow.


OK! So loops can slow your code. So what to do now? What if we can restrict our lists to have only one data type that we can let Python know in advance? Can we then skip some of the per-iteration type checking Python does to speed up our code. NumPy does something similar. NumPy allows arrays to only have a single data type and stores the data internally in a contiguous block of memory. Taking advantage of this fact, NumPy delegates most of the operations on such arrays to optimized, pre-compiled C code under the hood.

In fact, most of the functions you call using NumPy in your python code are merely wrappers for underlying code in C where most of the heavy lifting happens. In this way, NumPy can move the execution of loops to C, which is much more efficient than Python when it comes to looping. Notice this can be only done as the array enforces the elements of the array to be of the same kind. Otherwise, it would not be possible to convert the Python data types to native C ones to be executed under the hood.

Let’s take an example. Let’s write a short piece of code that takes two arrays and performs element-wise multiplication. We put the code in a function just so that we can conveniently time our code later.

def multiply_lists(li_a, li_b):
    for i in zip(li_a, li_b):
    	li_a[i] * li_b[i]

Don’t worry about not storing the value each iteration. The point of this exercise to merely see the performance of certain operations and not really bother about the results. We just want to see how a particular number of multiplication operations take.

However, if we were using NumPy arrays, we would not need to write a loop. We can simply do this like shown below.

arr_a = np.array(li_a)
arr_b = np.array(li_b)

def multiply_arrays(arr_a, arr_b):
	arr_a * arr_b

How does this happen? This is because internally, NumPy delegates the loop to pre-compiled, optimized C code under the hood. This process is called vectorization of the multiplication operator. Technically, the term vectorization of a function means that the function is now applied simultaneously over many values instead of a single value, which is how it looks from the python code ( Loops are nonetheless executed but in C)

Now that we have used a vectorized function in place of the loop, does it provide us with a boost in speed? We run repeat the experiment 5 times ( -r flag) , with the code being executed 10000 times ( -n flag ) over each run.

%timeit -n 10000 -r 5 multiply_lists(li_a, li_b)
%timeit -n 10000 -r 5 multiply_arrays(arr_a, arr_b)

The following is my output.

Times on your machine may differ depending upon processing power and other tasks running in background. But you will nevertheless notice considerable speedups to the tune of about 20-30x when using the NumPy’s vectorized solution.

Note that I’m using the %timeit magic here because I am running the experiments in the Jupyter cell. If you are using plain python code, then you would have to use timeit.timeit function. The output of the timeit.timeit function is merely the total time which you will have to divide with number of iterations.

import timeit
total_time = timeit.timeit("multiply_lists(li_a, li_b)", "from __main__ import multiply_lists, li_a, li_b", number = 10000)

time_per_run = total_time / 10000


Also, from now on, when I mention the phrase vectorizing a loop, what I mean is taking a loop and implementing the same functionality using one of NumPy’s vectorized functions.

In addition to vectorizing a loop which performs operations on two arrays of equal size, we can also vectorize a loop which performs operations between an array and a scalar. For example, the loop:

prod = 0
for x in li_a:
	prod += x * 5

Can be vectorized as:

np.array(li_a) * 5
prod = li_a.sum()

#numpy #python #machine-learning #data-science #developer

What is GEEK

Buddha Community

NumPy Optimization: Vectorization and Broadcasting
Bailee  Streich

Bailee Streich


Linear Algebra for Data Scientists with NumPy - Analytics India Magazine

A geek in Machine Learning with a Master’s degree in…

####### READ NEXT

Delhivery Promises To Fly Charters With Oxygen Concentrators In India

NumPy is an essential Python library to perform mathematical and scientific computations. NumPy offers Python’s array-like data structures with exclusive operations and methods. Many data science libraries and frameworks, including PandasScikit-Learn, Statsmodels, Matplotlib and SciPy, are built on top of NumPy with Numpy arrays in their building blocks. Some frameworks, including TensorFlow and PyTorch, introduce NumPy arrays or NumPy-alike arrays as their fundamental data structure in the name of tensors.

NumPy in data scienceHow NumPy becomes the base of Data Science computing system (source)

Data Science relies heavily on Linear Algebra. NumPy is famous for its Linear Algebra operations. This article discusses methods available in the NumPy library to perform various Linear Algebra operations with examples. These examples assume that the readers have a basic understanding of NumPy arrays. Check out the following articles to have a better understanding of NumPy fundamentals:

  1. Fundamental Concepts of NumPy
  2. Basic Programming with NumPy
  3. Top Resources to Learn NumPy

#developers corner #linear algebra #matrices #numpy #numpy array #numpy dot product #numpy matrix multiplication #numpy tutorial #svd #vectors

NumPy Features - Why we should use Numpy?

Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.

numpy features

NumPy Features

These are the important features of NumPy:

1. High-performance N-dimensional array object

This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.

a. One dimensional array

The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.

b. Multidimensional array

In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.

2. It contains tools for integrating code from C/C++ and Fortran

We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.

#numpy tutorials #features of numpy #numpy features #why use numpy #numpy

NumPy Applications - Uses of Numpy

In this Numpy tutorial, we will learn Numpy applications.

NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.

numpy applications

Numpy Applications

1. An alternative for lists and arrays in Python

Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.

2. NumPy maintains minimal memory

Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.

3. Using NumPy for multi-dimensional arrays

We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.

4. Mathematical operations with NumPy

Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.

#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy

NumPy Broadcasting Computation on Arrays

Broadcasting in NumPy is a very useful concept. The array size is a very important concept when performing arithmetic operations on arrays. Arrays cannot always be of the same shape. We perform the array operations on corresponding array elements. Hence we broadcast the smaller array along with the larger array. This is done to achieve similar dimensions.

NumPy Broadcasting

NumPy Broadcasting

This functionality helps to perform arithmetic operations with ease. We use it to broadcast the arrays to convert them into similar shapes.

The following rules are applicable on the arrays for broadcasting:

1. If both the arrays do not have similar shapes, then we prepend the array of lower rank with 1s. This process continues until both the arrays have the same shape.

2. We can consider the arrays compatible if they have the same length in a dimension or if one of the arrays has size as 1.

3. Broadcasting is applicable if arrays are compatible in all the dimensions.

4. The shape of the arrays turns out to be equivalent element-wise after we perform broadcasting.

5. The smaller array appears as if it is a copy along the dimension.

6. The length of each dimension is either the same or 1.

a. Broadcasting array with the same shape

  1. a = np.array(1.0, 2.0, 3.0])
  2. b = np.array([2.0, 2.0, 2.0])
  3. a * b


array([ 2., 4., 6.])

b. Broadcasting an array along with a scalar value

  1. a = np. array([1.0, 2.0, 3.0])
  2. b = 2.0
  3. a * b


array([ 2., 4., 6.])

In both of the above examples, we have done broadcasting keeping in mind the broadcasting rules.

#numpy tutorials #numpy broadcasting #numpy broadcasting limitations

NumPy Copies and Views - Copy Vs View in NumPy

NumPy consists of different methods to duplicate an original array. The two main functions for this duplication are copy and view. The duplication of the array means an array assignment. When we duplicate the original array, the changes made in the new array may or may not reflect. The duplicate array may use the same location or may be at a new memory location.

NumPy copy and View

Copy or Deep copy in NumPy

It returns a copy of the original array stored at a new location. The copy doesn’t share data or memory with the original array. The modifications are not reflected. The copy function is also known as deep copy.

import numpy as np
arr = np.array([20,30,50,70])
a= arr.copy()
#changing a value in original array
arr[0] = 100



[100 30 50 70]

[20 30 50 70]

Changes made in the original array are not reflected in the copy.

import numpy as np
arr = np.array([20,30,50,70])
a= arr.copy()
#changing a value in copy array
a[0] = 5



[20 30 50 70]

[ 5 30 50 70]

Changes made in copy are not reflected in the original array

#numpy tutorials #numpy copy #numpy views #numpy