Does the order in which you run two 'for' loops matter?

  • #1
824
44

Summary:

For example, if I wanted to run through a table with m rows and n columns. Is it faster if I check the values of the table row-wise or column-wise?

Main Question or Discussion Point

It's a simple question, and I feel like the order in which you run two loops would not matter in regards to how long it would take to run the script. You'd still have to run through ##m \cdot n## cells no matter the order. If you check column-wise, you will have to check all m entries in each of the n columns; similarly, if you check row-wise, you check all n entries in each of the m columns. I feel like it would take the same amount of time to run some code regardless of the order you choose to run a 'for' loop within a 'for' loop, for the sole reason that multiplication is a commutative operation, in other words.
 

Answers and Replies

  • #2
11,794
5,402
your array is stored in memory following either row major or column major order depending on the programming language. Consequently, you might get a CPU boost if you process each successive element of the array if they are neighbors in memory.
 
  • Like
  • Informative
Likes hmmm27, jim mcnamara, hutchphd and 2 others
  • #3
Ibix
Science Advisor
Insights Author
6,421
5,072
As jedishrfu says, maybe. One way of making an M×N array is to define a one dimensional array with MN elements and provide some method of accessing the first N elements as (1,m) and the next N elements as (2,m) and so on. You can do this explicitly in languages like C and C++. In that case you get a slight speed boost by having one loop that runs MN times instead of nested loops, because you don't have to do the inner loop initialisation and don't have to translate (n,m) into n×M+m.

But sometimes you need to know the row and column location, or maybe the language doesn't let you play games with array structures and pointers. In that case you have to use two loops and the order of the loops probably doesn't matter.
 
  • Like
Likes Eclair_de_XII and sysprog
  • #4
rcgldr
Homework Helper
8,688
521
The wiki article covers this and notes which languages are row major or column major:

https://en.wikipedia.org/wiki/Row-_and_column-major_order

In the case of hardware, when a fixed sized matrix, such as a n row by 512 byte column matrix, needs to be efficient for both row and column ordering the, memory addressing is setup so that both row and column major order end up being column access order for the ram.
 
  • #5
FactChecker
Science Advisor
Gold Member
5,578
2,056
Optimization techniques can cause unexpected timing results. I once thought that I was being smart by traversing an array of data first one way, then backward, and alternating after that. My idea was that the most recent data used would still be in memory when the algorithm turned around. It ran slower. Optimizing techniques include "look-ahead" retrieval of data. I recommend that you conduct some timing tests of alternative approaches if speed is an important issue. And if speed is important, select the highest optimization level that you can.
 
  • #6
11,794
5,402
the idea of neighboring values speeding your computation is due to the instruction cache used by some processors anticipating the locations you will next access.
 
  • #7
1,955
252
the idea of neighboring values speeding your computation is due to the instruction cache used by some processors anticipating the locations you will next access.
It's really the data cache. The main effect is that data is moved from main memory to the caches in cache lines.
A common size of a cache line on 64 bit processors is 64 bytes.
If you do not read from consecutive locations, you might be using only 1 double precision number (8 bytes) per cache line that you read (64 bytes). Accessing large blocks of memory could be up to 8 times slower.

If you have only a small array that fits in the smallest data cache, it might not matter however.
A 32k level one cache can contain a 64x64 array of double precision values.

If you do more calculations with each array value, the extra delay might also disappear. There is a data prefetcher that can recognize regular access patters of data, and load your block into the cache before you need it.
 
  • Like
Likes jedishrfu and FactChecker
  • #8
.Scott
Homework Helper
2,534
912
Another potential factor is optimization performed by the compiler.

If you process the elements in the order they occur in memory, the compiler is more likely to recognize that you are treating the array as one-dimensional. The result could be the saving of some of the array indexing arithmetic and the availability of one more register.

But this would be a very minor.
 
Top