Matrix multiplication in C and the impact of cache locality on performance

Matrix multiplication is a piece of cake for anybody in the field of Computer Science. _How difficult can it be? _It is just a matter of creating 2D arrays, populating it with data, and finally a nested loop. You would be amazed to hear that how you implement the matrix multiplication has a significant impact on the elapsed time.

C Code for MatrixMultiplication

#include <stdlib.h>
	#include <stdio.h>
	#include <time.h>

	#define n 2048

	double A[n][n];
	double B[n][n];
	double C[n][n];

	int main() {

	    //populate the matrices with random values between 0.0 and 1.0
	    for (int i = 0; i < n; i++) {
	        for (int j = 0; j < n; j++) {

	            A[i][j] = (double) rand() / (double) RAND_MAX;
	            B[i][j] = (double) rand() / (double) RAND_MAX;
	            C[i][j] = 0;
	        }
	    }

	    struct timespec start, end;
	    double time_spent;

	    //matrix multiplication
	    clock_gettime(CLOCK_REALTIME, &start);
	    for (int i = 0; i < n; i++) {
	        for (int j = 0; j < n; j++) {
	            for (int k = 0; k < n; k++) {
	                C[i][j] += A[i][k] * B[k][j];
	            }
	        }
	    }
	    clock_gettime(CLOCK_REALTIME, &end);
	    time_spent = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1000000000.0;
	    printf("Elapsed time in seconds: %f \n", time_spent);
	    return 0;
	}

You can compile and run it using the following commands.

gcc -o matrix MatrixMultiplication.c
./martix

This is how the majority of us implement matrix multiplication. _What changes can we make? _Can we change the order of the nested loops? Of course, we can! There is no rule saying that the loops should be in the order i → j → k (even though that’s what we do most of the time). You can write a loop as follows and would still get the correct output (the order does not matter).

for (int k = 0; k < n; k++) {
    for (int j = 0; j < n; j++) {
        for (int i = 0; i < n; i++) {
            C[i][j] += A[i][k] * B[k][j];
        }
    }
}

The interesting question is, will it make a difference in the performance?

Let’s find out!

#programming #c #computer-science #c-programming #performance-engineering

C Code for MatrixMultiplication

levelup.gitconnected.com

Matrix multiplication in C and the impact of cache locality on performance