Yvette  Bell

Yvette Bell


Alpaka: Abstraction Library for Parallel Kernel Acceleration For C++

alpaka - Abstraction Library for Parallel Kernel Acceleration

The alpaka library is a header-only C++17 abstraction library for accelerator development.

Its aim is to provide performance portability across accelerators through the abstraction (not hiding!) of the underlying levels of parallelism.

It is platform independent and supports the concurrent and cooperative use of multiple devices such as the hosts CPU as well as attached accelerators as for instance CUDA GPUs and Xeon Phis (currently native execution only). A multitude of accelerator back-end variants using CUDA, OpenMP (2.0/4.0), Boost.Fiber, std::thread and also serial execution is provided and can be selected depending on the device. Only one implementation of the user kernel is required by representing them as function objects with a special interface. There is no need to write special CUDA, OpenMP or custom threading code. Accelerator back-ends can be mixed within a device queue. The decision which accelerator back-end executes which kernel can be made at runtime.

The abstraction used is very similar to the CUDA grid-blocks-threads division strategy. Algorithms that should be parallelized have to be divided into a multi-dimensional grid consisting of small uniform work items. These functions are called kernels and are executed in parallel threads. The threads in the grid are organized in blocks. All threads in a block are executed in parallel and can interact via fast shared memory. Blocks are executed independently and can not interact in any way. The block execution order is unspecified and depends on the accelerator in use. By using this abstraction the execution can be optimally adapted to the available hardware.

Software License

alpaka is licensed under MPL-2.0.


The alpaka documentation can be found in the online manual. The documentation files in .rst (reStructuredText) format are located in the docs subfolder of this repository. The source code documentation is generated with doxygen.

Accelerator Back-ends

Accelerator Back-endLib/APIDevicesExecution strategy grid-blocksExecution strategy block-threads
Serialn/aHost CPU (single core)sequentialsequential (only 1 thread per block)
OpenMP 2.0+ blocksOpenMP 2.0+Host CPU (multi core)parallel (preemptive multitasking)sequential (only 1 thread per block)
OpenMP 2.0+ threadsOpenMP 2.0+Host CPU (multi core)sequentialparallel (preemptive multitasking)
OpenMP 5.0+OpenMP 5.0+Host CPU (multi core)parallel (undefined)parallel (preemptive multitasking)
  GPUparallel (undefined)parallel (lock-step within warps)
OpenACC (experimental)OpenACC 2.0+Host CPU (multi core)parallel (undefined)parallel (preemptive multitasking)
  GPUparallel (undefined)parallel (lock-step within warps)
std::threadstd::threadHost CPU (multi core)sequentialparallel (preemptive multitasking)
Boost.Fiberboost::fibers::fiberHost CPU (single core)sequentialparallel (cooperative multitasking)
TBBTBB 2.2+Host CPU (multi core)parallel (preemptive multitasking)sequential (only 1 thread per block)
CUDACUDA 9.0+NVIDIA GPUsparallel (undefined)parallel (lock-step within warps)
HIP(clang)HIP 4.0+AMD GPUsparallel (undefined)parallel (lock-step within warps)

Supported Compilers

This library uses C++17 (or newer when available).

Accelerator Back-endgcc 7.5
gcc 8.5
gcc 9.4
gcc 10.3
gcc 11.1
clang 5-7
clang 8-9
clang 10
clang 11
clang 12
clang 13
Apple LLVM 12.4.0/13.2.1
Visual Studio 2019
Visual Studio 2022
OpenMP 2.0+ blocks
OpenMP 2.0+ threads
OpenMP 5.0 (CPU)-
CUDA (nvcc)
(CUDA 11.0-11.6)

(CUDA 11.0-11.6)

(CUDA 11.0-11.6)

(CUDA 11.6)

(CUDA 11.0-11.2; 11.6)

(CUDA 11.1, 11.2, 11.6)

(CUDA 11.6)

(CUDA 11.6)

(CUDA 11.6)

(CUDA 11.2-11.6)

(CUDA 11.6)
CUDA (clang)-------
(CUDA 9.2-10.1)

(CUDA 10.0-10.2)
HIP (clang)✅ (HIP 4.2)✅ (HIP 4.3 - 5.0)---

Other compilers or combinations marked with ❌ in the table above may work but are not tested in CI and are therefore not explicitly supported.


Boost 1.74.0+ is the only mandatory external dependency. The alpaka library itself just requires header-only libraries. However some of the accelerator back-end implementations require different boost libraries to be built.

When an accelerator back-end using Boost.Fiber is enabled, boost-fiber and all of its dependencies are required to be built in C++17 mode ./b2 cxxflags="-std=c++17".

When an accelerator back-end using CUDA is enabled, version 11.0 (with nvcc as CUDA compiler) or version 9.2 (with clang as CUDA compiler) of the CUDA SDK is the minimum requirement. NOTE: When using nvcc as CUDA compiler, the CUDA accelerator back-end can not be enabled together with the Boost.Fiber accelerator back-end due to bugs in the nvcc compiler. NOTE: When using clang as a native CUDA compiler, the CUDA accelerator back-end can not be enabled together with the Boost.Fiber accelerator back-end or any OpenMP accelerator back-end because this combination is currently unsupported. NOTE: Separable compilation is disabled by default and can be enabled via the CMake flag CMAKE_CUDA_SEPARABLE_COMPILATION.

When an accelerator back-end using OpenMP is enabled, the compiler and the platform have to support the corresponding minimum OpenMP version.

When an accelerator back-end using TBB is enabled, the compiler and the platform have to support the corresponding minimum TBB version.


The library is header only so nothing has to be built. CMake 3.18+ is required to provide the correct defines and include paths. Just call alpaka_add_executable instead of add_executable and the difficulties of the CUDA nvcc compiler in handling .cu and .cpp files are automatically taken care of. Source files do not need any special file ending. Examples of how to utilize alpaka within CMake can be found in the example folder.

The whole alpaka library can be included with: #include <alpaka/alpaka.hpp> Code that is not intended to be utilized by the user is hidden in the detail namespace.

Furthermore, for a CUDA-like experience when adopting alpaka we provide the library cupla. It enables a simple and straightforward way of porting existing CUDA applications to alpaka and thus to a variety of accelerators.


For a quick introduction, feel free to playback the recording of our presentation at GTC 2016:

Citing alpaka

Currently all authors of alpaka are scientists or connected with research. For us to justify the importance and impact of our work, please consider citing us accordingly in your derived work and publications:

% Peer-Reviewed Publication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Peer reviewed and accepted publication in
%   "2nd International Workshop on Performance Portable
%    Programming Models for Accelerators (P^3MA)"
% colocated with the
%   "2017 ISC High Performance Conference"
%   in Frankfurt, Germany
  author    = {{Matthes}, A. and {Widera}, R. and {Zenker}, E. and {Worpitz}, B. and
               {Huebl}, A. and {Bussmann}, M.},
  title     = {Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code
               using the Alpaka library},
  archivePrefix = "arXiv",
  eprint    = {1706.10086},
  keywords  = {Computer Science - Distributed, Parallel, and Cluster Computing},
  day       = {30},
  month     = {Jun},
  year      = {2017},
  url       = {https://arxiv.org/abs/1706.10086},

% Peer-Reviewed Publication %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Peer reviewed and accepted publication in
%   "The Sixth International Workshop on
%    Accelerators and Hybrid Exascale Systems (AsHES)"
% at the
%   "30th IEEE International Parallel and Distributed
%    Processing Symposium" in Chicago, IL, USA
  author    = {Erik Zenker and Benjamin Worpitz and Ren{\'{e}} Widera
               and Axel Huebl and Guido Juckeland and
               Andreas Kn{\"{u}}pfer and Wolfgang E. Nagel and Michael Bussmann},
  title     = {Alpaka - An Abstraction Library for Parallel Kernel Acceleration},
  archivePrefix = "arXiv",
  eprint    = {1602.08477},
  keywords  = {Computer science;CUDA;Mathematical Software;nVidia;OpenMP;Package;
               performance portability;Portability;Tesla K20;Tesla K80},
  day       = {23},
  month     = {May},
  year      = {2016},
  publisher = {IEEE Computer Society},
  url       = {http://arxiv.org/abs/1602.08477},

% Original Work: Benjamin Worpitz' Master Thesis %%%%%%%%%%
  author = {Benjamin Worpitz},
  title  = {Investigating performance portability of a highly scalable
            particle-in-cell simulation code on various multi-core
  school = {{Technische Universit{\"{a}}t Dresden}},
  month  = {Sep},
  year   = {2015},
  type   = {Master Thesis},
  doi    = {10.5281/zenodo.49768},
  url    = {http://dx.doi.org/10.5281/zenodo.49768}


Rules for contributions can be found in CONTRIBUTING.md


Maintainers* and Core Developers

  • Benjamin Worpitz* (original author)
  • Dr. Sergei Bastrakov*
  • Dr. Andrea Bocci
  • Dr. Antonio Di Pilato
  • Simeon Ehrig
  • Bernhard Manfred Gruber*
  • Dr. Jeffrey Kelling
  • Dr. Felice Pantaleo
  • Jan Stephan*
  • Dr. Jiří Vyskočil
  • René Widera*

Former Members, Contributions and Thanks

  • Dr. Michael Bussmann
  • Mat Colgrove
  • Valentin Gehrke
  • Dr. Axel Huebl
  • Maximilian Knespel
  • Jakob Krude
  • Alexander Matthes
  • Hauke Mewes
  • Phil Nash
  • Dr. David M. Rogers
  • Mutsuo Saito
  • Jonas Schenke
  • Daniel Vollmer
  • Matthias Werner
  • Bert Wesarg
  • Malte Zacharias
  • Erik Zenker

Author: alpaka-group
Source Code: https://github.com/alpaka-group/alpaka
License: MPL-2.0 License


What is GEEK

Buddha Community

Alpaka: Abstraction Library for Parallel Kernel Acceleration For C++

Abstract class & Abstract method in C# | OOP in C# Part-24


#oop #abstract #abstract method #abstract class #what is abstract

Tamale  Moses

Tamale Moses


How to Run C/C++ in Sublime Text?

C and C++ are the most powerful programming language in the world. Most of the super fast and complex libraries and algorithms are written in C or C++. Most powerful Kernel programs are also written in C. So, there is no way to skip it.

In programming competitions, most programmers prefer to write code in C or C++. Tourist is considered the worlds top programming contestant of all ages who write code in C++.

During programming competitions, programmers prefer to use a lightweight editor to focus on coding and algorithm designing. VimSublime Text, and Notepad++ are the most common editors for us. Apart from the competition, many software developers and professionals love to use Sublime Text just because of its flexibility.

I have discussed the steps we need to complete in this blog post before running a C/C++ code in Sublime Text. We will take the inputs from an input file and print outputs to an output file without using freopen file related functions in C/C++.

#cpp #c #c-programming #sublimetext #c++ #c/c++

Dicey Issues in C/C++

If you are familiar with C/C++then you must have come across some unusual things and if you haven’t, then you are about to. The below codes are checked twice before adding, so feel free to share this article with your friends. The following displays some of the issues:

  1. Using multiple variables in the print function
  2. Comparing Signed integer with unsigned integer
  3. Putting a semicolon at the end of the loop statement
  4. C preprocessor doesn’t need a semicolon
  5. Size of the string matters
  6. Macros and equations aren’t good friends
  7. Never compare Floating data type with double data type
  8. Arrays have a boundary
  9. Character constants are different from string literals
  10. Difference between single(=) and double(==) equal signs.

The below code generates no error since a print function can take any number of inputs but creates a mismatch with the variables. The print function is used to display characters, strings, integers, float, octal, and hexadecimal values onto the output screen. The format specifier is used to display the value of a variable.

  1. %d indicates Integer Format Specifier
  2. %f indicates Float Format Specifier
  3. %c indicates Character Format Specifier
  4. %s indicates String Format Specifier
  5. %u indicates Unsigned Integer Format Specifier
  6. %ld indicates Long Int Format Specifier

Image for post

A signed integer is a 32-bit datum that encodes an integer in the range [-2147483648 to 2147483647]. An unsigned integer is a 32-bit datum that encodes a non-negative integer in the range [0 to 4294967295]. The signed integer is represented in twos-complement notation. In the below code the signed integer will be converted to the maximum unsigned integer then compared with the unsigned integer.

Image for post

#problems-with-c #dicey-issues-in-c #c-programming #c++ #c #cplusplus

Shaylee  Lemke

Shaylee Lemke


How to solve the implicitly declaring library function warning in C

Learn how to solve the implicitly declaring library function warning in C

#c #c# #c++ #programming-c

Ari  Bogisich

Ari Bogisich


Loops in C++ | For, While, and Do While Loops in C++

In this Video We are going to see how to use Loops in C++. We will see How to use For, While, and Do While Loops in C++.
C++ is general purpose, compiled, object-oriented programming language and its concepts served as the basis for several other languages such as Java, Python, Ruby, Perl etc.

#c #c# #c++ #programming-c