Rahul  Gandhi

Rahul Gandhi


NumPy Internals, Strides, Reshape and Transpose

We cover basic mistakes that can lead to unnecessary copying of data and memory allocation in NumPy. We further cover NumPy internals, strides, reshaping, and transpose in detail.

In the first two parts of our series on NumPy optimization, we have primarily covered how to speed up your code by trying to substitute loops for vectorized code. We covered the basics of vectorization and broadcasting, and then used them to optimize an implementation of the K-Means algorithm, speeding it up by 70x compared to the loop-based implementation.

Following the format of Parts 1 and 2, Part 3 (this one) will focus on introducing a bunch of NumPy features with some theory–namely NumPy internals, strides, reshape and transpose. Part 4 will cover the application of these tools to a practical problem.

In the earlier posts we covered how to deal with loops. In this post we will focus on yet another bottleneck that can often slow down NumPy code: unnecessary copying and memory allocation. The ability to minimize both issues not only speeds up the code, but may also reduce the memory a program takes up.

We will begin with some basic mistakes that can lead to unnecessary copying of data and memory allocation. Then we’ll take a deep dive into how NumPy internally stores its arrays, how operations like reshape and transpose are performed, and detail a visualization method to compute the results of such operations without typing a single line of code.

In Part 4, we’ll use the things we’ve learned in this part to optimize the output pipeline of an object detector. But let’s leave that for later.

So, let’s get started.

Preallocate Preallocate Preallocate!

A mistake that I made myself in the early days of moving to NumPy, and also something that I see many people make, is to use the loop-and-append paradigm. So, what exactly do I mean by this?

Consider the following piece of code. It adds an element to a list during each iteration of the loop.

li = []
import random

for i in range(10000):
	## Something important goes here
    x = random.randint(1,10)

The script above merely creates a list containing random integers from zero to nine. However, instead of a random number, the thing we are adding to the list could be the result of some involved operation happening every iteration of the loop.

append is an amortized O(1) operation in Python. In simple words, on average, and regardless of how large your list is, append will take a constant amount of time. This is the reason you’ll often spot this method being used to add to lists in Python. Heck, this method is so popular that you will even find it deployed in production-grade code. I call this the loop-and-append paradigm. While it works well in Python, the same can’t be said for NumPy.

When people switch to NumPy and they have to do something similar, this is what they sometimes do.

## Do the operation for first step, as you can't concatenate an empty array later
arr = np.random.randn(1,10)

## Loop
for i in range(10000 - 1):
        arr = np.concatenate((arr, np.random.rand(1,10)))

Alternatively, you could also use the np.append operation in place of np.concatenate. In fact, np.append internally uses np.concatenate, so its performance is upper-bounded by the performance of np.concatenate.

Nevertheless, this is not really a good way to go about such operations. Because np.concatenate, unlike append, is not a constant-time function. In fact, it is a linear-time function as it includes creating a new array in memory, and then copying the contents of the two arrays to be concatenated to the newly allocated memory.

But why can’t NumPy implement a constant time concatenate, along the lines of how append works? The answer to this lies in how lists and NumPy arrays are stored.

The Difference Between How Lists And Arrays Are Stored

A Python list is made up references that point to objects. While the references are stored in a contiguous manner, the objects they point to can be anywhere in the memory.

A python list is made of reference to the objects, which are stored elsewhere in the memory.

Whenever we create a Python List, a certain amount of contagious space is allocated for the references that make up the list. Suppose a list has n elements. When we call append on a list, python simply inserts a reference to the object (being appended) at the n+1th

slot in contagious space.

An append operation merely involves adding a reference to wherever the appended object is stored in the memory. No copying is involved.

Once this contagious space fills up, a new, larger memory block is allocated to the list, with space for new insertions. The elements of the list are copied to the new memory location. While the time for copying elements to the new location is not constant (it would increase with size of the array), copying operations are often very rare. Therefore, on an average, append takes constant time independent of the size of the array

However, when it comes to NumPy, arrays are basically stored as contagious blocks of objects that make up the array. Unlike Python lists, where we merely have references, actual objects are stored in NumPy arrays.

Numpy Arrays are stored as objects (32-bit Integers here) in the memory lined up in a contiguous manner

All the space for a NumPy array is allocated before hand once the the array is initialised.

a = np.zeros((10,20)) ## allocate space for 10 x 20 floats

There is no dynamic resizing going on the way it happens for Python lists. When you call np.concatenate on two arrays, a completely new array is allocated, and the data of the two arrays is copied over to the new memory location. This makes np.concatenate slower than append even if it’s being executed in C.

To circumvent this issue, you should preallocate the memory for arrays whenever you can. Preallocate the array before the body of the loop and simply use slicing to set the values of the array during the loop. Below is such a variant of the above code.

arr = np.zeros((10000,10))

for i in range(10000):
    arr[i] = np.random.rand(1,10)

Here, we allocate the memory only once. The only copying involved is copying random numbers to the allocated space and not moving around array in memory every iteration.

Timing the code

In order to see the speed benefits of preallocating arrays, we time the two snippets using timeit.

%%timeit -n 100

arr = np.random.randn(1,10)
for i in range(10000 - 1):
        arr = np.concatenate((arr, np.random.rand(1,10)))

The output is

Whereas for the code with pre-allocation.

%%timeit -n 10

arr = np.zeros((10000,10))

for i in range(10000):
    arr[i] = np.random.rand(1,10)

We get a speed up of about 25x.

#numpy #python #machine-learning #big-data #developer

What is GEEK

Buddha Community

NumPy Internals, Strides, Reshape and Transpose

NumPy Features - Why we should use Numpy?

Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays

NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.

numpy features

NumPy Features

These are the important features of NumPy:

1. High-performance N-dimensional array object

This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.

a. One dimensional array

The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.

b. Multidimensional array

In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.

2. It contains tools for integrating code from C/C++ and Fortran

We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.

#numpy tutorials #features of numpy #numpy features #why use numpy #numpy

NumPy Applications - Uses of Numpy

In this Numpy tutorial, we will learn Numpy applications.

NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.

numpy applications

Numpy Applications

1. An alternative for lists and arrays in Python

Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.

2. NumPy maintains minimal memory

Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.

3. Using NumPy for multi-dimensional arrays

We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.

4. Mathematical operations with NumPy

Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.

#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy

Bailee  Streich

Bailee Streich


Linear Algebra for Data Scientists with NumPy - Analytics India Magazine

A geek in Machine Learning with a Master’s degree in…

####### READ NEXT

Delhivery Promises To Fly Charters With Oxygen Concentrators In India

NumPy is an essential Python library to perform mathematical and scientific computations. NumPy offers Python’s array-like data structures with exclusive operations and methods. Many data science libraries and frameworks, including PandasScikit-Learn, Statsmodels, Matplotlib and SciPy, are built on top of NumPy with Numpy arrays in their building blocks. Some frameworks, including TensorFlow and PyTorch, introduce NumPy arrays or NumPy-alike arrays as their fundamental data structure in the name of tensors.

NumPy in data scienceHow NumPy becomes the base of Data Science computing system (source)

Data Science relies heavily on Linear Algebra. NumPy is famous for its Linear Algebra operations. This article discusses methods available in the NumPy library to perform various Linear Algebra operations with examples. These examples assume that the readers have a basic understanding of NumPy arrays. Check out the following articles to have a better understanding of NumPy fundamentals:

  1. Fundamental Concepts of NumPy
  2. Basic Programming with NumPy
  3. Top Resources to Learn NumPy

#developers corner #linear algebra #matrices #numpy #numpy array #numpy dot product #numpy matrix multiplication #numpy tutorial #svd #vectors

NumPy Copies and Views - Copy Vs View in NumPy

NumPy consists of different methods to duplicate an original array. The two main functions for this duplication are copy and view. The duplication of the array means an array assignment. When we duplicate the original array, the changes made in the new array may or may not reflect. The duplicate array may use the same location or may be at a new memory location.

NumPy copy and View

Copy or Deep copy in NumPy

It returns a copy of the original array stored at a new location. The copy doesn’t share data or memory with the original array. The modifications are not reflected. The copy function is also known as deep copy.

import numpy as np
arr = np.array([20,30,50,70])
a= arr.copy()
#changing a value in original array
arr[0] = 100



[100 30 50 70]

[20 30 50 70]

Changes made in the original array are not reflected in the copy.

import numpy as np
arr = np.array([20,30,50,70])
a= arr.copy()
#changing a value in copy array
a[0] = 5



[20 30 50 70]

[ 5 30 50 70]

Changes made in copy are not reflected in the original array

#numpy tutorials #numpy copy #numpy views #numpy

Brad  Hintz

Brad Hintz


NumPy Installation - How to Install Numpy in Python

Python is an open-source object-oriented language. It has many features of which one is the wide range of external packages. There are a lot of packages for installation and use for expanding functionalities. These packages are a repository of functions in python script. NumPy is one such package to ease array computations. To install all these python packages we use the pip- package installer. Pip is automatically installed along with Python. We can then use pip in the command line to install packages from PyPI.

_Keeping you updated with latest technology trends, _Join DataFlair on Telegram

Install Numpy in Mac OS

Python comes pre-installed on Mac OS. However, it has an old system version the newer versions can be downloaded alongside.

1. Open the terminal in your MacBook.

2. In the terminal, we use the pip command to install the package

  1. pip install numpy

3. If you use Python3, enter the pip3 command.

  1. pip3 install numpy

#numpy tutorials #install numpy #installing numpy #numpy installation