Can You Implement a Column-Major Matrix Storage Data Structure in C?

Click For Summary

Discussion Overview

The discussion revolves around the implementation of a column-major matrix storage data structure in C, contrasting it with the row-major order used by C and the column-major order used by Fortran. Participants explore the implications of these storage orders on performance and memory layout, as well as the conventions in programming languages.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants question the feasibility of implementing a column-major data structure in C, suggesting that C inherently uses row-major order for array storage.
  • Others argue that it is possible to achieve a column-major layout by declaring arrays in a specific way, although this may seem counterintuitive regarding performance.
  • There is a discussion about the differences in memory layout between C and Fortran, with examples illustrating how arrays are stored in memory in both languages.
  • Some participants highlight that the choice of array declaration can affect performance, suggesting that iterating over columns may yield better performance in certain contexts.
  • Concerns are raised about the assumptions made regarding the efficiency of matrix calculations in C, with some suggesting that the advice in textbooks may be debatable.
  • Participants note that conventions in programming languages can influence how natural it feels to reference matrix elements, with some advocating for flexibility in these conventions.

Areas of Agreement / Disagreement

Participants express differing views on the implementation of column-major storage in C, with no consensus reached on whether this is practical or beneficial. The discussion remains unresolved regarding the implications of different storage orders on performance and usability.

Contextual Notes

Some participants mention limitations in understanding the implications of memory layout and performance optimizations, indicating that the discussion may depend on specific use cases and definitions of efficiency.

  • #31
voko said:
This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?

A Core i7-2620M on a Lenovo T420 laptop that I bought a little over 3 years ago (and upgraded to 8 GB RAM). Also I ran the code under Linux (specifically, Fedora 21) and compiled with version 4.9.2 of GCC.

Did you use a different compiler than GCC?

[EDIT: I've checked that compiling with -march=sandybridge doesn't make any obvious difference.]
 
Last edited:
Technology news on Phys.org
  • #32
Another way to look at this is to consider matrix multiplcation. C matrices are oriented for the left matrix in a matrix multiply, while Fortran matrices are oriented for the right matrix in a matrix multiply (the inner products are done left matrix across, right matrix down). Still I've always thought that Fortrans orientation is awkward. APL is another programming language that dates back to the 1960's and it's orientation for multi-dimensional arrays is the same as C.
 
  • #33
ellipsis said:
Are you sure you aren't comparing the uncommented code with the commented code?

I don't know if this is likely, but something that just might be worth checking is that voko's compiler actually generates the loop code. One possible issue with having the second loop commented out is that the resulting program stuffs a bunch of values into an array but then never uses them, so a very thorough compiler might identify the loop as dead code and remove it entirely. (GCC does this with -O3 optimisation if you make the array local to the main function.)
 
  • Like
Likes   Reactions: mafagafo
  • #34
wle said:
Did you use a different compliler than GCC?

I have now access to the dev environment that I normally use. Using cygwin+gcc, I get results similar to yours.

Using Micosoft's Visual Studio 2013, I get very different results. I have modified your code to exclude perturbations due to process creation and VM loading, and I have also converted it to C++, chiefly because MS VS does not have portable high resolution timer routines for C. Here is the code:

Code:
#include <chrono>
#include <stdio.h>

#define ASIZE 5000

int arr[ASIZE][ASIZE];

int main()
{
    int i, j, res = 0;

    /* first loop - touch all pages */
    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    static auto const now = []() {return std::chrono::high_resolution_clock::now();};
    static auto const us = [](decltype(now() - now()) dur) { return std::chrono::nanoseconds(dur).count() / 1000.0; };

    auto const a = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    auto const b = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[j][i] = 2 * i + j;
        }
    }

    auto const c = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[i][j];
        }
    }

    auto const d = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[j][i];
        }
    }

    auto const e = now();

    printf("%f, %f, %f, %f\n", us(b - a), us(c - b), us(d - c), us(e - d));

    return res;
}

Typical result: 14000.900000, 14003.300000, 10001.700000, 544087.500000, even though I observe about 5 ms variance in each of these.

I have verified that the compiler did not optimize anything away. I can post the listing if anyone is interested.

Of interest. If I change the array type to a 64-bit integer, then the typical result is 34004.000000, 23003.900000, 15002.000000, 14008.000000.
 
  • #35
mafagafo said:
See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

The author seems to be referring to the differences between compilers.
 
  • #36
mafagafo said:
And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

An algorithm is an algorithm, and a language is a language. I have no idea why the author mixed the two together.

Yes, C and C++ are popular choices.
 

Similar threads

  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 10 ·
Replies
10
Views
26K
Replies
3
Views
2K
  • · Replies 2 ·
Replies
2
Views
5K
  • · Replies 8 ·
Replies
8
Views
6K