This post is a very special one for me because it contains things that I have learned while preparing for Google Summer of Code 2020 (not selected).
CuPy provides an API called ElementwiseKernel
to parallelize operations on GPU.
Below is**_ the before and after section,_** so that compares the run-time of the same task (computing the elementwise squared difference of two arrays) without using elementwise kernels and using elementwise kernels on arrays of the same size. It’s okay if it doesn’t make a lot of sense here. I will explain it in detail as we go on.
You see, the array size is 1 million in both cases. Below is the speed-up comparison.
**~40304 times faster. **Wait, but the title said only 10x. Actually, this was a very straightforward task. In a real-world scenario, the operations will not be this simple. And that will lead to speed up loss. So, you can expect it to run more than 10x–100x faster than the normal approach, the only condition is that your task should be parallelizable.
#python #speed #speed-up #cuda #gpu #programming