We cover basic mistakes that can lead to unnecessary copying of data and memory allocation in NumPy. We further cover NumPy internals, strides, reshaping, and transpose in detail.
In the first two parts of our series on NumPy optimization, we have primarily covered how to speed up your code by trying to substitute loops for vectorized code. We covered the basics of vectorization and broadcasting, and then used them to optimize an implementation of the K-Means algorithm, speeding it up by 70x compared to the loop-based implementation.
Following the format of Parts 1 and 2, Part 3 (this one) will focus on introducing a bunch of NumPy features with some theory–namely NumPy internals, strides, reshape and transpose. Part 4 will cover the application of these tools to a practical problem.
In the earlier posts we covered how to deal with loops. In this post we will focus on yet another bottleneck that can often slow down NumPy code: unnecessary copying and memory allocation. The ability to minimize both issues not only speeds up the code, but may also reduce the memory a program takes up.
We will begin with some basic mistakes that can lead to unnecessary copying of data and memory allocation. Then we’ll take a deep dive into how NumPy internally stores its arrays, how operations like reshape and transpose are performed, and detail a visualization method to compute the results of such operations without typing a single line of code.
In Part 4, we’ll use the things we’ve learned in this part to optimize the output pipeline of an object detector. But let’s leave that for later.
So, let’s get started.
A mistake that I made myself in the early days of moving to NumPy, and also something that I see many people make, is to use the loop-and-append paradigm. So, what exactly do I mean by this?
Consider the following piece of code. It adds an element to a list during each iteration of the loop.
li =  import random for i in range(10000): ## Something important goes here x = random.randint(1,10) li.append(x)
The script above merely creates a list containing random integers from zero to nine. However, instead of a random number, the thing we are adding to the list could be the result of some involved operation happening every iteration of the loop.
append is an amortized
O(1) operation in Python. In simple words, on average, and regardless of how large your list is,
append will take a constant amount of time. This is the reason you’ll often spot this method being used to add to lists in Python. Heck, this method is so popular that you will even find it deployed in production-grade code. I call this the loop-and-append paradigm. While it works well in Python, the same can’t be said for NumPy.
When people switch to NumPy and they have to do something similar, this is what they sometimes do.
## Do the operation for first step, as you can't concatenate an empty array later arr = np.random.randn(1,10) ## Loop for i in range(10000 - 1): arr = np.concatenate((arr, np.random.rand(1,10)))
Alternatively, you could also use the
np.append operation in place of
np.concatenate. In fact,
np.append internally uses
np.concatenate, so its performance is upper-bounded by the performance of
Nevertheless, this is not really a good way to go about such operations. Because
append, is not a constant-time function. In fact, it is a linear-time function as it includes creating a new array in memory, and then copying the contents of the two arrays to be concatenated to the newly allocated memory.
But why can’t NumPy implement a constant time
concatenate, along the lines of how
append works? The answer to this lies in how lists and NumPy arrays are stored.
list is made up references that point to objects. While the references are stored in a contiguous manner, the objects they point to can be anywhere in the memory.
A python list is made of reference to the objects, which are stored elsewhere in the memory.
Whenever we create a Python List, a certain amount of contagious space is allocated for the references that make up the list. Suppose a list has
n elements. When we call
append on a list, python simply inserts a reference to the object (being appended) at the n+1th
slot in contagious space.
An append operation merely involves adding a reference to wherever the appended object is stored in the memory. No copying is involved.
Once this contagious space fills up, a new, larger memory block is allocated to the list, with space for new insertions. The elements of the list are copied to the new memory location. While the time for copying elements to the new location is not constant (it would increase with size of the array), copying operations are often very rare. Therefore, on an average, append takes constant time independent of the size of the array
However, when it comes to NumPy, arrays are basically stored as contagious blocks of objects that make up the array. Unlike Python lists, where we merely have references, actual objects are stored in NumPy arrays.
Numpy Arrays are stored as objects (32-bit Integers here) in the memory lined up in a contiguous manner
All the space for a NumPy array is allocated before hand once the the array is initialised.
a = np.zeros((10,20)) ## allocate space for 10 x 20 floats
There is no dynamic resizing going on the way it happens for Python lists. When you call
np.concatenate on two arrays, a completely new array is allocated, and the data of the two arrays is copied over to the new memory location. This makes
np.concatenate slower than append even if it’s being executed in C.
To circumvent this issue, you should preallocate the memory for arrays whenever you can. Preallocate the array before the body of the loop and simply use slicing to set the values of the array during the loop. Below is such a variant of the above code.
arr = np.zeros((10000,10)) for i in range(10000): arr[i] = np.random.rand(1,10)
Here, we allocate the memory only once. The only copying involved is copying random numbers to the allocated space and not moving around array in memory every iteration.
In order to see the speed benefits of preallocating arrays, we time the two snippets using
%%timeit -n 100 arr = np.random.randn(1,10) for i in range(10000 - 1): arr = np.concatenate((arr, np.random.rand(1,10)))
The output is
Whereas for the code with pre-allocation.
%%timeit -n 10 arr = np.zeros((10000,10)) for i in range(10000): arr[i] = np.random.rand(1,10)
We get a speed up of about 25x.
#numpy #python #machine-learning #big-data #developer
Welcome to DataFlair!!! In this tutorial, we will learn Numpy Features and its importance.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
NumPy (Numerical Python) is an open-source core Python library for scientific computations. It is a general-purpose array and matrices processing package. Python is slower as compared to Fortran and other languages to perform looping. To overcome this we use NumPy that converts monotonous code into the compiled form.
These are the important features of NumPy:
This is the most important feature of the NumPy library. It is the homogeneous array object. We perform all the operations on the array elements. The arrays in NumPy can be one dimensional or multidimensional.
The one-dimensional array is an array consisting of a single row or column. The elements of the array are of homogeneous nature.
In this case, we have various rows and columns. We consider each column as a dimension. The structure is similar to an excel sheet. The elements are homogenous.
We can use the functions in NumPy to work with code written in other languages. We can hence integrate the functionalities available in various programming languages. This helps implement inter-platform functions.
#numpy tutorials #features of numpy #numpy features #why use numpy #numpy
In this Numpy tutorial, we will learn Numpy applications.
NumPy is a basic level external library in Python used for complex mathematical operations. NumPy overcomes slower executions with the use of multi-dimensional array objects. It has built-in functions for manipulating arrays. We can convert different algorithms to can into functions for applying on arrays.NumPy has applications that are not only limited to itself. It is a very diverse library and has a wide range of applications in other sectors. Numpy can be put to use along with Data Science, Data Analysis and Machine Learning. It is also a base for other python libraries. These libraries use the functionalities in NumPy to increase their capabilities.
Arrays in Numpy are equivalent to lists in python. Like lists in python, the Numpy arrays are homogenous sets of elements. The most important feature of NumPy arrays is they are homogenous in nature. This differentiates them from python arrays. It maintains uniformity for mathematical operations that would not be possible with heterogeneous elements. Another benefit of using NumPy arrays is there are a large number of functions that are applicable to these arrays. These functions could not be performed when applied to python arrays due to their heterogeneous nature.
Arrays in NumPy are objects. Python deletes and creates these objects continually, as per the requirements. Hence, the memory allocation is less as compared to Python lists. NumPy has features to avoid memory wastage in the data buffer. It consists of functions like copies, view, and indexing that helps in saving a lot of memory. Indexing helps to return the view of the original array, that implements reuse of the data. It also specifies the data type of the elements which leads to code optimization.
We can also create multi-dimensional arrays in NumPy.These arrays have multiple rows and columns. These arrays have more than one column that makes these multi-dimensional. Multi-dimensional array implements the creation of matrices. These matrices are easy to work with. With the use of matrices the code also becomes memory efficient. We have a matrix module to perform various operations on these matrices.
Working with NumPy also includes easy to use functions for mathematical computations on the array data set. We have many modules for performing basic and special mathematical functions in NumPy. There are functions for Linear Algebra, bitwise operations, Fourier transform, arithmetic operations, string operations, etc.
#numpy tutorials #applications of numpy #numpy applications #uses of numpy #numpy
A geek in Machine Learning with a Master’s degree in…
####### READ NEXT
NumPy is an essential Python library to perform mathematical and scientific computations. NumPy offers Python’s array-like data structures with exclusive operations and methods. Many data science libraries and frameworks, including Pandas, Scikit-Learn, Statsmodels, Matplotlib and SciPy, are built on top of NumPy with Numpy arrays in their building blocks. Some frameworks, including TensorFlow and PyTorch, introduce NumPy arrays or NumPy-alike arrays as their fundamental data structure in the name of tensors.
How NumPy becomes the base of Data Science computing system (source)
Data Science relies heavily on Linear Algebra. NumPy is famous for its Linear Algebra operations. This article discusses methods available in the NumPy library to perform various Linear Algebra operations with examples. These examples assume that the readers have a basic understanding of NumPy arrays. Check out the following articles to have a better understanding of NumPy fundamentals:
#developers corner #linear algebra #matrices #numpy #numpy array #numpy dot product #numpy matrix multiplication #numpy tutorial #svd #vectors
NumPy consists of different methods to duplicate an original array. The two main functions for this duplication are copy and view. The duplication of the array means an array assignment. When we duplicate the original array, the changes made in the new array may or may not reflect. The duplicate array may use the same location or may be at a new memory location.
It returns a copy of the original array stored at a new location. The copy doesn’t share data or memory with the original array. The modifications are not reflected. The copy function is also known as deep copy.
import numpy as np arr = np.array([20,30,50,70]) a= arr.copy() #changing a value in original array arr = 100 print(arr) print(a)
[100 30 50 70]
[20 30 50 70]
Changes made in the original array are not reflected in the copy.
import numpy as np arr = np.array([20,30,50,70]) a= arr.copy() #changing a value in copy array a = 5 print(arr) print(a)
[20 30 50 70]
[ 5 30 50 70]
Changes made in copy are not reflected in the original array
#numpy tutorials #numpy copy #numpy views #numpy
Python is an open-source object-oriented language. It has many features of which one is the wide range of external packages. There are a lot of packages for installation and use for expanding functionalities. These packages are a repository of functions in python script. NumPy is one such package to ease array computations. To install all these python packages we use the pip- package installer. Pip is automatically installed along with Python. We can then use pip in the command line to install packages from PyPI.
_Keeping you updated with latest technology trends, _Join DataFlair on Telegram
Python comes pre-installed on Mac OS. However, it has an old system version the newer versions can be downloaded alongside.
1. Open the terminal in your MacBook.
2. In the terminal, we use the pip command to install the package
3. If you use Python3, enter the pip3 command.
#numpy tutorials #install numpy #installing numpy #numpy installation