This post is a very special one for me because it contains things that I have learned while preparing for Google Summer of Code 2020 (not selected).

Motivation

CuPy provides an API called ElementwiseKernel to parallelize operations on GPU.

Below is**_ the before and after section,_** so that compares the run-time of the same task (computing the elementwise squared difference of two arrays) without using elementwise kernels and using elementwise kernels on arrays of the same size. It’s okay if it doesn’t make a lot of sense here. I will explain it in detail as we go on.

Before (Without using elementwise kernel)

After (Using elementwise kernels)

You see, the array size is 1 million in both cases. Below is the speed-up comparison.

**~40304 times faster. **Wait, but the title said only 10x. Actually, this was a very straightforward task. In a real-world scenario, the operations will not be this simple. And that will lead to speed up loss. So, you can expect it to run more than 10x–100x faster than the normal approach, the only condition is that your task should be parallelizable.

#python #speed #speed-up #cuda #gpu #programming

Make your Python functions 10x faster
2.55 GEEK