Can You Implement a Column-Major Matrix Storage Data Structure in C?

wle · Dec 14, 2014

voko said:

This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?

A Core i7-2620M on a Lenovo T420 laptop that I bought a little over 3 years ago (and upgraded to 8 GB RAM). Also I ran the code under Linux (specifically, Fedora 21) and compiled with version 4.9.2 of GCC.

Did you use a different compiler than GCC?

[EDIT: I've checked that compiling with -march=sandybridge doesn't make any obvious difference.]

rcgldr · Dec 14, 2014

Another way to look at this is to consider matrix multiplcation. C matrices are oriented for the left matrix in a matrix multiply, while Fortran matrices are oriented for the right matrix in a matrix multiply (the inner products are done left matrix across, right matrix down). Still I've always thought that Fortrans orientation is awkward. APL is another programming language that dates back to the 1960's and it's orientation for multi-dimensional arrays is the same as C.

wle · Dec 14, 2014

ellipsis said:

Are you sure you aren't comparing the uncommented code with the commented code?

I don't know if this is likely, but something that just might be worth checking is that voko's compiler actually generates the loop code. One possible issue with having the second loop commented out is that the resulting program stuffs a bunch of values into an array but then never uses them, so a very thorough compiler might identify the loop as dead code and remove it entirely. (GCC does this with -O3 optimisation if you make the array local to the main function.)

voko · Dec 15, 2014

wle said:

Did you use a different compliler than GCC?

I have now access to the dev environment that I normally use. Using cygwin+gcc, I get results similar to yours.

Using Micosoft's Visual Studio 2013, I get very different results. I have modified your code to exclude perturbations due to process creation and VM loading, and I have also converted it to C++, chiefly because MS VS does not have portable high resolution timer routines for C. Here is the code:

Code:

#include <chrono>
#include <stdio.h>

#define ASIZE 5000

int arr[ASIZE][ASIZE];

int main()
{
    int i, j, res = 0;

    /* first loop - touch all pages */
    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    static auto const now = []() {return std::chrono::high_resolution_clock::now();};
    static auto const us = [](decltype(now() - now()) dur) { return std::chrono::nanoseconds(dur).count() / 1000.0; };

    auto const a = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    auto const b = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[j][i] = 2 * i + j;
        }
    }

    auto const c = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[i][j];
        }
    }

    auto const d = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[j][i];
        }
    }

    auto const e = now();

    printf("%f, %f, %f, %f\n", us(b - a), us(c - b), us(d - c), us(e - d));

    return res;
}

Typical result: 14000.900000, 14003.300000, 10001.700000, 544087.500000, even though I observe about 5 ms variance in each of these.

I have verified that the compiler did not optimize anything away. I can post the listing if anyone is interested.

Of interest. If I change the array type to a 64-bit integer, then the typical result is 34004.000000, 23003.900000, 15002.000000, 14008.000000.

SixNein · Dec 16, 2014

mafagafo said:

See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

The author seems to be referring to the differences between compilers.

SixNein · Dec 16, 2014

mafagafo said:

And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

An algorithm is an algorithm, and a language is a language. I have no idea why the author mixed the two together.

Yes, C and C++ are popular choices.

Can You Implement a Column-Major Matrix Storage Data Structure in C?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

How to increase phone signal strength by lying about it

Who is responsible for the software when AI takes over programming?

Use of AI (ML/DL) in Science

Could the reason why I can't select any kernels in VS Code be this error?

How useful is this if I want to begin programming?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight