Exploring Unimplemented Data Structures in C: Matrix Storage by Columns

mafagafo · Dec 12, 2014

See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

Mark44 · Dec 12, 2014

mafagafo said:

See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

What this is saying is that for the definition below, C stores the elements in what is called row-major order.

Code:

int arr[3][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};

In the memory of a C program, the numbers are stored as 1, 1, 1, 2, 2, 2, 3, 3, 3, in consecutive memory locations. Fortran, in contrast, stores array elements in column-major order, so that the same array would appear in memory as 1, 2, 3, 1, 2, 3, 1, 2, 3, in consecutive memory locations.

So, no, the textbook is correct.

mafagafo · Dec 12, 2014

I get it now. Thanks.

But there are two problems still:
First, your declaration seems wrong.
Mark44: Fixed the first array index, which is now 3.

Second, why wouldn't you use {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}} for this same matrix? If this was properly documented, I don't see a reason why not to (supposing the performance gain would be noticeable).

Mark44 · Dec 12, 2014

mafagafo said:

I get it now. Thanks.

But there are two problems still:
First, your declaration seems wrong.

Second, why wouldn't you use {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}} for this same matrix? If this was properly documented, I don't see a reason why not to (supposing the performance gain would be noticeable).

What same matrix? The matrix (the terminology is "array" in programming) I was talking about is the one in my example.
Neither of the points you raised is a problem, except possibly in your understanding. My declaration isn't "wrong." I get to declare my own array any way that I want, as long as I follow the syntactic rules of C. What I posted is a proper declaration for a two-dimensional array in C.

If I had wanted the array to be as you wrote it, with 1, 2, and 3 in each row, I would have declared it that way. As I wrote my declaration, the first row is 1, 1, 1, and the second row is 2, 2, 2, and the third row is 3, 3, 3. As I said before, memory would look like this: 1 1 1 2 2 2 3 3 3, with the numbers appearing in consecutive memory locations.

What I was saying is that if I had declared the array as I did in post #2, memory would look as I described it.

mafagafo · Dec 12, 2014

" My declaration isn't "wrong." "
int arr[2][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
Is your compiler okay with this?
Mark44 edit: Changed to int arr[3][3] = ...

"If I had wanted the array to be as you wrote it, with 1, 2, and 3 in each row, I would have declared it that way. As I wrote my declaration, the first row is 1, 1, 1, and the second row is 2, 2, 2, and the third row is 3, 3, 3. As I said before, memory would look like this: 1 1 1 2 2 2 3 3 3, with the numbers appearing in consecutive memory locations."

Yes, I get it. And pretty much every living creature also does. What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.

mafagafo · Dec 12, 2014

And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

Mark44 · Dec 12, 2014

Yes, that 2 should be a 3 in the declaration. I wish you had been more specific. I thought you meant the numbers I put in my array, not that the first array index was too small.

Mark44 · Dec 12, 2014

mafagafo said:

And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

I don't know this for certain, but I would bet that a lot of the back-end code is implemented in Fortran. There are lots of math libraries, such as LINPACK and many others that do the heavy lifting for matrix inversion, eigenvalue/eigenvector calculation, solving differential equations, and so on.

mafagafo · Dec 12, 2014

Yes, I've seen some stuff of numerical linear algebra packages after Googling about it and coming here.

But the most important part, which you did not refer to in your last answers is

mafagafo said:

(...) What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.

Am I correct? You just need to write your matrices differently to get what would be the same disposition on memory.

C:

int a[b][c];

It seems counter intuitive that iterating (suppose we have the array above) all values of b for each value of c should be faster, but if it is, then I learned something new.

ellipsis · Dec 12, 2014

mafagafo said:

And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

Interestingly, some of the scripting functionality compiles to Java (it even allows Swing access). It used to be Fortran, but they switched it to C because at the time Fortran didn't have dynamic memory allocation.

FactChecker · Dec 12, 2014

It's not a question of what you can do in C, it is a question of what the compiler automatically does by default for you. M[row][column] is the most natural way to reference elements of a matrix M. In C if you want a matrix, M, to have an element in row=r and column=c referenced by M[r][c], then the compiler forces where those elements are stored consecutively:
M[1][1], M[1][2], M[1][3],... M[2][1], M[2][2], ...

The reason no one makes a structure to change the C order is that matrix calculations are easy to do efficiently the way it is now. It's easy to define loop calculations that use the matrix numbers from memory in sequence.

P.S. The advice in the book regarding the speed of calculations is very debatable. There are many steps and optimizations between your code and the speed of the resulting execution. There are look-ahead data fetches and guesses about which paths your code execution will take that make it virtually impossible for you to anticipate what will be fast.

voko · Dec 14, 2014

FactChecker said:

M[row][column] is the most natural way to reference elements of a matrix M.

Nope. This is matter of convention. A program consistently using the other convention, especially if well document, would read just as natural.

The C language API for LAPACK supports both conventions: http://www.netlib.org/lapack/lapacke.html#_array_arguments.

wle · Dec 14, 2014

mafagafo said:

Yes, I get it. And pretty much every living creature also does. What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.

The issue isn't Fortan's way of storing arrays being better than C's or vice versa. The languages simply follow different conventions which you should keep in mind when you write code that loops over arrays.

In C, the array element arr[2][2] is always directly followed by the array element arr[2][3] in memory. Technically, in terms of the C language definition and for the example array defined above, if p is a pointer to the array element arr[2][2], then p + 1 is guaranteed to point to the array element arr[2][3] and p + 3 points to the array element arr[3][2].

Fortran is a bit less straightforward. First of all, Fortran has no concept of pointer arithmetic, so there's no language-level way to define where array elements are going to be placed in memory. As far as I know, that means it's simply a convention that compiler writers follow to store arrays by column first in memory rather than by row. Second, even that convention isn't necessarily always true. I actually experimented several years ago with gfortran and passing arrays between C and Fortran to figure out what gfortran was doing with arrays. Here's what I remember for a two-dimensional array:

Internally, gfortran creates two "multiplier" integer variables to keep track of where array elements are located. Calling the variables m and n, it would compute something like m * i + n * j to figure out where the array element arr(i,j) is located in memory.
For a newly created array, m = 1 and n is set to the number of columns, so array elements in the same column are stored sequentially in memory.
Certain array manipulation functions, like the transpose and array slicing functions, work by changing these values. For example, transpose(arr) works by interchanging the values of m and n rather than moving all the array elements around in memory.

FactChecker · Dec 14, 2014

voko said:

Nope. This is matter of convention. A program consistently using the other convention, especially if well document, would read just as natural.

Perhaps I should have been more specific. In mathematics, matrix indices or subscripts are normally in row,column order. So the C syntax is much more natural for a mathematician.

mafagafo · Dec 14, 2014

FactChecker said:

Perhaps I should have been more specific. In mathematics, matrix indices or subscripts are normally in row,column order. So the C syntax is much more natural for a mathematician.

I think you got it wrong. It is not about C syntax. C does not have any idea of what a matrix is. Yes, if we talk about [itex]A_{m,n}[/itex], I would agree with you that, undoubtedly, [itex]A[/itex] has [itex]m[/itex] rows and [itex]n[/itex] columns.

But what I suggested, and voko also talked about, is using the "reversed" convention when writing it using C in order to get the same memory disposition that Fortran would generate by following the "normal" convention.

I also find array_identifier[row][col] to be simpler, but nonetheless suggested using it the other way around to get better performance. voko not only say that it is acceptable to do it the other way around, but that "A program consistently using the other convention, especially if well document, would read just as natural.".

FactChecker · Dec 14, 2014

mafagafo said:

I think you got it wrong. It is not about C syntax. C does not have any idea of what a matrix is. Yes, if we talk about [itex]A_{m,n}[/itex], I would agree with you that, undoubtedly, [itex]A[/itex] has [itex]m[/itex] rows and [itex]n[/itex] columns.

I stand by my statement. What voko complained about was my statement: "M[row][column] is the most natural way to reference elements of a matrix M.". I then admitted that I should have referred to the math convention.

M_row,column <=> M[row][column] is a straightforward mapping to the long-standing convention in math.

mafagafo · Dec 14, 2014

In this case, you are right.

But still, "the C syntax is much more natural for a mathematician." does not make much sense to me as C does not even imply what the "outer" array represents. If instead you made a direct reference to a given example of C that used [row][column] you would make sense.

FactChecker · Dec 14, 2014

mafagafo said:

In this case, you are right.

But still, "the C syntax is much more natural for a mathematician." does not make much sense to me as C does not even imply what the "outer" array represents. If instead you made a direct reference to a given example of C that used [row][column] you would make sense.

I think I see your point. In C, M[i1][i2] could be referring to anything from a simple matrix of numbers to a set of complicated structure pointers. So the math matrix interpretation is not as paramount as I was assuming. I will concede the point.

voko · Dec 14, 2014

Having been thinking on and off about this for half a day, I now believe that the quoted passage is fundamentally wrong on two accounts.

1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?

2. C has no intrinsic knowledge of "rows" or "columns". A multi-dimensional arrays in C is a block of memory with a particular relationship between indices and the address in the block. The programmer is free to interpret any index as a row or as a column. The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless. In fact, so can a mathematician, provided that notation is consistent and well-defined.

mafagafo · Dec 14, 2014

voko said:

Having been thinking on and off about this for half a day, I now believe that the quoted passage is fundamentally wrong on two accounts.

1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?

2. C has no intrinsic knowledge of "rows" or "columns". A multi-dimensional arrays in C is a block of memory with a particular relationship between indices and the address in the block. The programmer is free to interpret any index as a row or as a column. The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless. In fact, so can a mathematician, provided that notation is consistent and well-defined.

As a side note, nothing has been mentioned here to support the claim that "(...) the most widely used professional algorithms for matrix computations are written in fortran (...)".

wle · Dec 14, 2014

voko said:

1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?

The point isn't that one convention is intrinsically better than the other. It's that there's often different ways of looping over arrays that will produce the same end result, and where that's the case you should know which convention your favourite programming language/library/environment is using and try to loop in a way that will minimise how much you jump around in memory. As an example, suppose you have a large 5000x5000 element array and you want to do something with each element of the array. In C, you should iterate over the second index in the inner loop:

C:

for (i = 0; i < 5000; ++i) {
    for (j = 0; j < 5000; ++j) {
        /* Do something with arr[i][j] */
    }
}

In Fortran, you should instead iterate over the first index in the inner loop:

Fortran:

do j=1,5000
    do i=1,5000
        ! Do something with arr(i,j)
    end do
end do

(or better still in this case, use Fortran's forall loop and let the compiler figure out the best way to do it.)

If you're wondering why this matters, by the way, it's because there's more than one type of memory on a computer. Specifically, modern computers have one or more levels of processor cache memory, main memory itself, and virtual memory (i.e. a page file or swap partition on the hard disk), and there are big differences in how quickly the computer accesses each of these. If you write code that jumps around needlessly over a very large array, your computer could waste a lot of time copying stuff between cache and main memory or (worse) main and virtual memory.

The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless.

That's not completely true. Fortran has a few built-in functions for matrix manipulation and these follow the standard mathematical (first index is row, second index is column) convention. So if you're storing matrices in Fortran rank-2 arrays, you should follow that convention if you want code like C = matmul(A, B) to work as expected.

voko · Dec 14, 2014

wle said:

The point isn't that

My impression upon reading and re-reading the passage several times, was that the author was of the opinion that Fortran's representation is superior to C's. Which is false.

I did not think that a trivial statement on the benefits of memory locality was worthy of any additional discussion.

wle said:

If you're wondering why this matters, the reason is that there are different levels of memory on a computer.

I would like to stay out of analyzing this with respect to modern hardware. Otherwise I would have to point out that in the examples given by you, you would find it very difficult to measure any impact of the loop ordering on performance, if compiled with a modern optimizing compiler and run on a modern CPU.

wle · Dec 14, 2014

voko said:

My impression upon reading and re-reading the passage several times, was that the author was of the opinion that Fortran's representation is superior to C's.

I didn't get that impression. Just that C and Fortran use different conventions and that one should be aware of them.

I would like to stay out of analyzing this with respect to modern hardware. Otherwise I would have to point out that in the examples given by you, you would find it very difficult to measure any impact of the loop ordering on performance, if compiled with a modern optimizing compiler and run on a modern CPU.

Have you actually tried this? I just tested it out with GCC and the difference can be quite measurable.

Specifically, I compiled the code below (saved in a file called "arrloop.c") with gcc -O3 -o arrloop arrloop.c. I increased the array size to 20000x20000 to make the code run longer. Timing the resulting executable produces a result like this:

Code:

$ time ./arrloop

real    0m0.752s
user    0m0.223s
sys     0m0.526s

Same code except with the loop orders interchanged ("arrloop2.c") and same compilation option:

Code:

$ time ./arrloop2

real    0m5.853s
user    0m5.248s
sys     0m0.598s

There'll be some variation each time you run the code, but the difference is quite substantial: a few hundred milliseconds of user time vs. over five seconds.

Code for arrloop.c:

C:

#define ASIZE 20000

int arr[ASIZE][ASIZE];

int main(void)
{
    int i, j, res = 0;

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[i][j];
        }
    }

    return res;
}

arrloop2.c is identical except that the two inner/outer loops are interchanged.

voko · Dec 14, 2014

wle said:

I increased the array size to 20000x20000 to make the code run longer.

That makes a huge difference. My comment was about the 5K version that you originally posted.

The original version also did not contain a dedicated single "read" loop. That loop is where the majority of time will be spent in the 20K version. Take it out, and you will get very different results.

wle · Dec 14, 2014

voko said:

That makes a huge difference. My comment was about the 5K version that you originally posted.

That still makes a difference. On my machine, 23 milliseconds vs. 189 milliseconds of user time (best times out of two or three tries).

That said, if you're counting on me using a specific array size, I think you're missing the point.

The original version also did not contain a dedicated single "read" loop. That loop is where the majority of time will be spent in the 20K version. Take it out, and you will get very different results.

Commenting out the first ("arr[ i][j] = 2 * i + j;") loop entirely and still with 5000x5000 arrays: 10 ms vs. 101 ms of user time (best time of three tries).

voko · Dec 14, 2014

wle said:

That said, if you're counting on me using a specific array size, I think you're missing the point.

Quote the opposite. You brought up the topic of hardware. Then you should understand that caches, translation look-aside buffers, write buffers, prefetchers, etc, all contribute to performance. Size matters a lot when it comes to this.

wle said:

Commenting out the first loop entirely

The "read loop is the second loop :)

wle · Dec 14, 2014

voko said:

Quote the opposite. You brought up the topic of hardware. Then you should understand that caches, translation look-aside buffers, write buffers, prefetchers, etc, all contribute to performance. Size matters a lot when it comes to this.

Well there's still a measurable difference for 5000x5000 arrays.

The "read loop is the second loop :)

What I originally put in my post was /* Do something with arr[ i][j] */.

But fine: comment out the first loop, leave in the second loop, and keep the array size to 5000x5000, so the code (for "arrloop.c") looks like this:

C:

#define ASIZE 5000

int arr[ASIZE][ASIZE];

int main(void)
{
    int i, j, res = 0;

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    /* for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[i][j];
        }
    } */

    return res;
}

"arrloop2.c" is, again, identical to this except with the loops over i and j swapped in order.

Best of four tries: 15 ms vs. 85 ms user time. (Slowest times: 22 ms vs. 119 ms.)

voko · Dec 14, 2014

wle said:

Best of four tries: 15 ms vs. 85 ms user time. (Slowest times: 22 ms vs. 119 ms.)

This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?

ellipsis · Dec 14, 2014

I saw an astounding difference on Windows Cygwin, i5 quadcore CPU 2.4ghz:

Code:

-@hatsComp ~/Projects
$ gcc array.c

-@hatsComp ~/Projects
$ ./a

-@hatsComp ~/Projects
$ time ./a

real  0m0.165s
user  0m0.062s
sys  0m0.062s

-@hatsComp ~/Projects
$ nano array2.c

-@hatsComp ~/Projects
$ gcc array2.c
.
-@hatsComp ~/Projects
$ time ./a

real  0m0.596s
user  0m0.514s
sys  0m0.015s

-@hatsComp ~/Projects
$

In the second run, I merely switched out [ i ] with [ j ] in the inner loop. This is using the read/write code as shown in wle's post (5000x5000 array)

ellipsis · Dec 14, 2014

voko said:

This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?

Are you sure you aren't comparing the uncommented code with the commented code?

wle · Dec 14, 2014

voko said:

This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?

A Core i7-2620M on a Lenovo T420 laptop that I bought a little over 3 years ago (and upgraded to 8 GB RAM). Also I ran the code under Linux (specifically, Fedora 21) and compiled with version 4.9.2 of GCC.

Did you use a different compiler than GCC?

[EDIT: I've checked that compiling with -march=sandybridge doesn't make any obvious difference.]

rcgldr · Dec 14, 2014

Another way to look at this is to consider matrix multiplcation. C matrices are oriented for the left matrix in a matrix multiply, while Fortran matrices are oriented for the right matrix in a matrix multiply (the inner products are done left matrix across, right matrix down). Still I've always thought that Fortrans orientation is awkward. APL is another programming language that dates back to the 1960's and it's orientation for multi-dimentional arrays is the same as C.

wle · Dec 14, 2014

ellipsis said:

Are you sure you aren't comparing the uncommented code with the commented code?

I don't know if this is likely, but something that just might be worth checking is that voko's compiler actually generates the loop code. One possible issue with having the second loop commented out is that the resulting program stuffs a bunch of values into an array but then never uses them, so a very thorough compiler might identify the loop as dead code and remove it entirely. (GCC does this with -O3 optimisation if you make the array local to the main function.)

voko · Dec 15, 2014

wle said:

Did you use a different compliler than GCC?

I have now access to the dev environment that I normally use. Using cygwin+gcc, I get results similar to yours.

Using Micosoft's Visual Studio 2013, I get very different results. I have modified your code to exclude perturbations due to process creation and VM loading, and I have also converted it to C++, chiefly because MS VS does not have portable high resolution timer routines for C. Here is the code:

Code:

#include <chrono>
#include <stdio.h>

#define ASIZE 5000

int arr[ASIZE][ASIZE];

int main()
{
    int i, j, res = 0;

    /* first loop - touch all pages */
    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    static auto const now = []() {return std::chrono::high_resolution_clock::now();};
    static auto const us = [](decltype(now() - now()) dur) { return std::chrono::nanoseconds(dur).count() / 1000.0; };

    auto const a = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[i][j] = 2 * i + j;
        }
    }

    auto const b = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            arr[j][i] = 2 * i + j;
        }
    }

    auto const c = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[i][j];
        }
    }

    auto const d = now();

    for (i = 0; i < ASIZE; ++i) {
        for (j = 0; j < ASIZE; ++j) {
            res += arr[j][i];
        }
    }

    auto const e = now();

    printf("%f, %f, %f, %f\n", us(b - a), us(c - b), us(d - c), us(e - d));

    return res;
}

Typical result: 14000.900000, 14003.300000, 10001.700000, 544087.500000, even though I observe about 5 ms variance in each of these.

I have verified that the compiler did not optimize anything away. I can post the listing if anyone is interested.

Of interest. If I change the array type to a 64-bit integer, then the typical result is 34004.000000, 23003.900000, 15002.000000, 14008.000000.

SixNein · Dec 16, 2014

mafagafo said:

See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

The author seems to be referring to the differences between compilers.

Exploring Unimplemented Data Structures in C: Matrix Storage by Columns

Attachments

1. What is the purpose of exploring unimplemented data structures in C?

2. What is a data structure in C?

3. Why is matrix storage by columns important?

4. How is matrix storage by columns different from matrix storage by rows?

5. Are there any drawbacks to using matrix storage by columns?

Similar threads

Hot Threads

Recent Insights