Geting 4x Speedup With .NET Core 3.0 SIMD Intrinsics

“The C++ code listed below, being translated to C#, will never be close to C++ in terms of speed.”

A few weeks ago Den Raskovalov and I had a fancy conversation on C## performance, which turned into a tiny but fun coding exercise. The statement to prove or disprove was:

auto computeSum(char* fileName) {
        auto fIn = open(fileName, O_RDONLY | O_BINARY, 0644);

        static constexpr size_t BUFFER_SIZE = 1 << 16;
        uint8_t buffer[BUFFER_SIZE];
        uint8_t const* pBuffer = nullptr;
        size_t bufferPos = 0;
        size_t bufferLen = 0;

        int64_t sum = 0;
        uint8_t b;
        int n = 0;
        while (bufferLen = read(fIn, buffer, BUFFER_SIZE)) {
            pBuffer = buffer;
            const uint8_t* const pBufferEnd = buffer + bufferLen;
            while (pBuffer != pBufferEnd) {
                if (*pBuffer < 128) {
                    n = (n << 7) + *pBuffer;
                } else {
                    n = (n << 7) + *pBuffer - 128;
                    sum += n;
                    n = 0;
        return sum;

