- #1
- 188
- 12
What this is saying is that for the definition below, C stores the elements in what is called row-major order.See attached image.
I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?
int arr[3][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
What same matrix? The matrix (the terminology is "array" in programming) I was talking about is the one in my example.I get it now. Thanks.
But there are two problems still:
First, your declaration seems wrong.
Second, why wouldn't you use {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}} for this same matrix? If this was properly documented, I don't see a reason why not to (supposing the performance gain would be noticeable).
I don't know this for certain, but I would bet that a lot of the back-end code is implemented in Fortran. There are lots of math libraries, such as LINPACK and many others that do the heavy lifting for matrix inversion, eigenvalue/eigenvector calculation, solving differential equations, and so on.And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.
(...) What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.
int a[b][c];
And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.
M[row][column] is the most natural way to reference elements of a matrix M.
Yes, I get it. And pretty much every living creature also does. What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.
Perhaps I should have been more specific. In mathematics, matrix indices or subscripts are normally in row,column order. So the C syntax is much more natural for a mathematician.Nope. This is matter of convention. A program consistently using the other convention, especially if well document, would read just as natural.
I think you got it wrong. It is not about C syntax. C does not have any idea of what a matrix is. Yes, if we talk about [itex]A_{m,n}[/itex], I would agree with you that, undoubtedly, [itex]A[/itex] has [itex]m[/itex] rows and [itex]n[/itex] columns.Perhaps I should have been more specific. In mathematics, matrix indices or subscripts are normally in row,column order. So the C syntax is much more natural for a mathematician.
I stand by my statement. What voko complained about was my statement: "M[row][column] is the most natural way to reference elements of a matrix M.". I then admitted that I should have referred to the math convention.I think you got it wrong. It is not about C syntax. C does not have any idea of what a matrix is. Yes, if we talk about [itex]A_{m,n}[/itex], I would agree with you that, undoubtedly, [itex]A[/itex] has [itex]m[/itex] rows and [itex]n[/itex] columns.
I think I see your point. In C, M[i1][i2] could be referring to anything from a simple matrix of numbers to a set of complicated structure pointers. So the math matrix interpretation is not as paramount as I was assuming. I will concede the point.In this case, you are right.
But still, "the C syntax is much more natural for a mathematician." does not make much sense to me as C does not even imply what the "outer" array represents. If instead you made a direct reference to a given example of C that used [row][column] you would make sense.
Having been thinking on and off about this for half a day, I now believe that the quoted passage is fundamentally wrong on two accounts.
1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?
2. C has no intrinsic knowledge of "rows" or "columns". A multi-dimensional arrays in C is a block of memory with a particular relationship between indices and the address in the block. The programmer is free to interpret any index as a row or as a column. The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless. In fact, so can a mathematician, provided that notation is consistent and well-defined.
1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?
for (i = 0; i < 5000; ++i) {
for (j = 0; j < 5000; ++j) {
/* Do something with arr[i][j] */
}
}
do j=1,5000
do i=1,5000
! Do something with arr(i,j)
end do
end do
The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless.
The point isn't that
If you're wondering why this matters, the reason is that there are different levels of memory on a computer.
My impression upon reading and re-reading the passage several times, was that the author was of the opinion that Fortran's representation is superior to C's.
I would like to stay out of analyzing this with respect to modern hardware. Otherwise I would have to point out that in the examples given by you, you would find it very difficult to measure any impact of the loop ordering on performance, if compiled with a modern optimizing compiler and run on a modern CPU.
$ time ./arrloop
real 0m0.752s
user 0m0.223s
sys 0m0.526s
$ time ./arrloop2
real 0m5.853s
user 0m5.248s
sys 0m0.598s
#define ASIZE 20000
int arr[ASIZE][ASIZE];
int main(void)
{
int i, j, res = 0;
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
arr[i][j] = 2 * i + j;
}
}
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
res += arr[i][j];
}
}
return res;
}
I increased the array size to 20000x20000 to make the code run longer.
That makes a huge difference. My comment was about the 5K version that you originally posted.
The original version also did not contain a dedicated single "read" loop. That loop is where the majority of time will be spent in the 20K version. Take it out, and you will get very different results.
That said, if you're counting on me using a specific array size, I think you're missing the point.
Commenting out the first loop entirely
Quote the opposite. You brought up the topic of hardware. Then you should understand that caches, translation look-aside buffers, write buffers, prefetchers, etc, all contribute to performance. Size matters a lot when it comes to this.
The "read loop is the second loop :)
#define ASIZE 5000
int arr[ASIZE][ASIZE];
int main(void)
{
int i, j, res = 0;
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
arr[i][j] = 2 * i + j;
}
}
/* for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
res += arr[i][j];
}
} */
return res;
}
Best of four tries: 15 ms vs. 85 ms user time. (Slowest times: 22 ms vs. 119 ms.)
-@hatsComp ~/Projects
$ gcc array.c
-@hatsComp ~/Projects
$ ./a
-@hatsComp ~/Projects
$ time ./a
real 0m0.165s
user 0m0.062s
sys 0m0.062s
-@hatsComp ~/Projects
$ nano array2.c
-@hatsComp ~/Projects
$ gcc array2.c
.
-@hatsComp ~/Projects
$ time ./a
real 0m0.596s
user 0m0.514s
sys 0m0.015s
-@hatsComp ~/Projects
$
This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?
This is not the result that I get on my Windows laptop. I get 31 - 46 milliseconds for either version consistently (wall clock time; 15 ms is the time quantum on Windows). The laptop runs on Intel Core i7, some two-three years old version, 64 bit mode. What is your CPU?
Are you sure you aren't comparing the uncommented code with the commented code?
Did you use a different compliler than GCC?
#include <chrono>
#include <stdio.h>
#define ASIZE 5000
int arr[ASIZE][ASIZE];
int main()
{
int i, j, res = 0;
/* first loop - touch all pages */
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
arr[i][j] = 2 * i + j;
}
}
static auto const now = []() {return std::chrono::high_resolution_clock::now();};
static auto const us = [](decltype(now() - now()) dur) { return std::chrono::nanoseconds(dur).count() / 1000.0; };
auto const a = now();
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
arr[i][j] = 2 * i + j;
}
}
auto const b = now();
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
arr[j][i] = 2 * i + j;
}
}
auto const c = now();
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
res += arr[i][j];
}
}
auto const d = now();
for (i = 0; i < ASIZE; ++i) {
for (j = 0; j < ASIZE; ++j) {
res += arr[j][i];
}
}
auto const e = now();
printf("%f, %f, %f, %f\n", us(b - a), us(c - b), us(d - c), us(e - d));
return res;
}
See attached image.
I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?