# Textbook wrong?

Tags:
1. Dec 12, 2014

### mafagafo

See attached image.

I have never heard that before. Why wouldn't one be able to implement a data structure that stores a matrix by its columns using C? Is that true?

#### Attached Files:

• ###### note.png
File size:
19.2 KB
Views:
104
2. Dec 12, 2014

### Staff: Mentor

What this is saying is that for the definition below, C stores the elements in what is called row-major order.
Code (Text):
int arr[3][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
In the memory of a C program, the numbers are stored as 1, 1, 1, 2, 2, 2, 3, 3, 3, in consecutive memory locations. Fortran, in contrast, stores array elements in column-major order, so that the same array would appear in memory as 1, 2, 3, 1, 2, 3, 1, 2, 3, in consecutive memory locations.

So, no, the textbook is correct.

Last edited: Dec 12, 2014
3. Dec 12, 2014

### mafagafo

I get it now. Thanks.

But there are two problems still:
First, your declaration seems wrong.
Mark44: Fixed the first array index, which is now 3.

Second, why wouldn't you use {{1, 2, 3}, {1, 2, 3}, {1, 2, 3}} for this same matrix? If this was properly documented, I don't see a reason why not to (supposing the performance gain would be noticeable).

Last edited by a moderator: Dec 12, 2014
4. Dec 12, 2014

### Staff: Mentor

What same matrix? The matrix (the terminology is "array" in programming) I was talking about is the one in my example.
Neither of the points you raised is a problem, except possibly in your understanding. My declaration isn't "wrong." I get to declare my own array any way that I want, as long as I follow the syntactic rules of C. What I posted is a proper declaration for a two-dimensional array in C.

If I had wanted the array to be as you wrote it, with 1, 2, and 3 in each row, I would have declared it that way. As I wrote my declaration, the first row is 1, 1, 1, and the second row is 2, 2, 2, and the third row is 3, 3, 3. As I said before, memory would look like this: 1 1 1 2 2 2 3 3 3, with the numbers appearing in consecutive memory locations.

What I was saying is that if I had declared the array as I did in post #2, memory would look as I described it.

5. Dec 12, 2014

### mafagafo

" My declaration isn't "wrong." "
int arr[2][3] = {{1, 1, 1}, {2, 2, 2}, {3, 3, 3}};
Is your compiler okay with this?
Mark44 edit: Changed to int arr[3][3] = ...

"If I had wanted the array to be as you wrote it, with 1, 2, and 3 in each row, I would have declared it that way. As I wrote my declaration, the first row is 1, 1, 1, and the second row is 2, 2, 2, and the third row is 3, 3, 3. As I said before, memory would look like this: 1 1 1 2 2 2 3 3 3, with the numbers appearing in consecutive memory locations."

Yes, I get it. And pretty much every living creature also does. What I was telling you is that it is perfectly possible to get the same disposition in memory you would get in Fortran by just declaring each column in what would be each row. Then operate on that if it offers better performance.

Last edited by a moderator: Dec 12, 2014
6. Dec 12, 2014

### mafagafo

And even when the author says that "the most widely used algorithms (...) Fortran.", from what I have seen MatLab is pretty widely used both in industry and research and its implemented mostly on C.

7. Dec 12, 2014

### Staff: Mentor

Yes, that 2 should be a 3 in the declaration. I wish you had been more specific. I thought you meant the numbers I put in my array, not that the first array index was too small.

8. Dec 12, 2014

### Staff: Mentor

I don't know this for certain, but I would bet that a lot of the back-end code is implemented in Fortran. There are lots of math libraries, such as LINPACK and many others that do the heavy lifting for matrix inversion, eigenvalue/eigenvector calculation, solving differential equations, and so on.

9. Dec 12, 2014

### mafagafo

Yes, I've seen some stuff of numerical linear algebra packages after Googling about it and coming here.

But the most important part, which you did not refer to in your last answers is

Am I correct? You just need to write your matrices differently to get what would be the same disposition on memory.

Code (C):
int a[b][c];
It seems counter intuitive that iterating (suppose we have the array above) all values of b for each value of c should be faster, but if it is, then I learned something new.

10. Dec 12, 2014

### ellipsis

Interestingly, some of the scripting functionality compiles to Java (it even allows Swing access). It used to be Fortran, but they switched it to C because at the time Fortran didn't have dynamic memory allocation.

11. Dec 12, 2014

### FactChecker

It's not a question of what you can do in C, it is a question of what the compiler automatically does by default for you. M[row][column] is the most natural way to reference elements of a matrix M. In C if you want a matrix, M, to have an element in row=r and column=c referenced by M[r][c], then the compiler forces where those elements are stored consecutively:
M[1][1], M[1][2], M[1][3],... M[2][1], M[2][2], ...

The reason no one makes a structure to change the C order is that matrix calculations are easy to do efficiently the way it is now. It's easy to define loop calculations that use the matrix numbers from memory in sequence.

P.S. The advice in the book regarding the speed of calculations is very debatable. There are many steps and optimizations between your code and the speed of the resulting execution. There are look-ahead data fetches and guesses about which paths your code execution will take that make it virtually impossible for you to anticipate what will be fast.

12. Dec 14, 2014

### voko

Nope. This is matter of convention. A program consistently using the other convention, especially if well document, would read just as natural.

The C language API for LAPACK supports both conventions: http://www.netlib.org/lapack/lapacke.html#_array_arguments.

13. Dec 14, 2014

### wle

The issue isn't Fortan's way of storing arrays being better than C's or vice versa. The languages simply follow different conventions which you should keep in mind when you write code that loops over arrays.

In C, the array element arr[2][2] is always directly followed by the array element arr[2][3] in memory. Technically, in terms of the C language definition and for the example array defined above, if p is a pointer to the array element arr[2][2], then p + 1 is guaranteed to point to the array element arr[2][3] and p + 3 points to the array element arr[3][2].

Fortran is a bit less straightforward. First of all, Fortran has no concept of pointer arithmetic, so there's no language-level way to define where array elements are going to be placed in memory. As far as I know, that means it's simply a convention that compiler writers follow to store arrays by column first in memory rather than by row. Second, even that convention isn't necessarily always true. I actually experimented several years ago with gfortran and passing arrays between C and Fortran to figure out what gfortran was doing with arrays. Here's what I remember for a two-dimensional array:
• Internally, gfortran creates two "multiplier" integer variables to keep track of where array elements are located. Calling the variables m and n, it would compute something like m * i + n * j to figure out where the array element arr(i,j) is located in memory.
• For a newly created array, m = 1 and n is set to the number of columns, so array elements in the same column are stored sequentially in memory.
• Certain array manipulation functions, like the transpose and array slicing functions, work by changing these values. For example, transpose(arr) works by interchanging the values of m and n rather than moving all the array elements around in memory.

14. Dec 14, 2014

### FactChecker

Perhaps I should have been more specific. In mathematics, matrix indices or subscripts are normally in row,column order. So the C syntax is much more natural for a mathematician.

15. Dec 14, 2014

### mafagafo

I think you got it wrong. It is not about C syntax. C does not have any idea of what a matrix is. Yes, if we talk about $A_{m,n}$, I would agree with you that, undoubtedly, $A$ has $m$ rows and $n$ columns.

But what I suggested, and voko also talked about, is using the "reversed" convention when writing it using C in order to get the same memory disposition that Fortran would generate by following the "normal" convention.

I also find array_identifier[row][col] to be simpler, but nonetheless suggested using it the other way around to get better performance. voko not only say that it is acceptable to do it the other way around, but that "A program consistently using the other convention, especially if well document, would read just as natural.".

16. Dec 14, 2014

### FactChecker

I stand by my statement. What voko complained about was my statement: "M[row][column] is the most natural way to reference elements of a matrix M.". I then admitted that I should have referred to the math convention.

Mrow,column <=> M[row][column] is a
straightforward mapping to the long-standing convention in math.

17. Dec 14, 2014

### mafagafo

In this case, you are right.

But still, "the C syntax is much more natural for a mathematician." does not make much sense to me as C does not even imply what the "outer" array represents. If instead you made a direct reference to a given example of C that used [row][column] you would make sense.

18. Dec 14, 2014

### FactChecker

I think I see your point. In C, M[i1][i2] could be referring to anything from a simple matrix of numbers to a set of complicated structure pointers. So the math matrix interpretation is not as paramount as I was assuming. I will concede the point.

19. Dec 14, 2014

### voko

Having been thinking on and off about this for half a day, I now believe that the quoted passage is fundamentally wrong on two accounts.

1. While not explicitly stating that, it suggests very heavily that Fortran's internal representation is superior. It has been pointed out in this thread that modern hardware makes this rather dubious. But more fundamentally, the statement is based on (assumed) performance of matrix multiplication. But that may not be a meaningful metric; a program solving a linear system or finding eigenvalues may not have a single matrix multiplication. Would Fortran's representation be superior in this case?

2. C has no intrinsic knowledge of "rows" or "columns". A multi-dimensional arrays in C is a block of memory with a particular relationship between indices and the address in the block. The programmer is free to interpret any index as a row or as a column. The situation is similar in Fortran, except that the relationship is not part of the language, but is an implementation detail. The programmer can assign "row" or "column" meaning to Fortran's indices regardless. In fact, so can a mathematician, provided that notation is consistent and well-defined.

20. Dec 14, 2014

### mafagafo

As a side note, nothing has been mentioned here to support the claim that "(...) the most widely used professional algorithms for matrix computations are written in fortran (...)".

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook