Does the order in which you run two 'for' loops matter?

Eclair_de_XII · Jun 21, 2020

It's a simple question, and I feel like the order in which you run two loops would not matter in regards to how long it would take to run the script. You'd still have to run through ##m \cdot n## cells no matter the order. If you check column-wise, you will have to check all m entries in each of the n columns; similarly, if you check row-wise, you check all n entries in each of the m columns. I feel like it would take the same amount of time to run some code regardless of the order you choose to run a 'for' loop within a 'for' loop, for the sole reason that multiplication is a commutative operation, in other words.

jedishrfu · Jun 21, 2020

your array is stored in memory following either row major or column major order depending on the programming language. Consequently, you might get a CPU boost if you process each successive element of the array if they are neighbors in memory.

Ibix · Jun 21, 2020

As jedishrfu says, maybe. One way of making an M×N array is to define a one dimensional array with MN elements and provide some method of accessing the first N elements as (1,m) and the next N elements as (2,m) and so on. You can do this explicitly in languages like C and C++. In that case you get a slight speed boost by having one loop that runs MN times instead of nested loops, because you don't have to do the inner loop initialisation and don't have to translate (n,m) into n×M+m.

But sometimes you need to know the row and column location, or maybe the language doesn't let you play games with array structures and pointers. In that case you have to use two loops and the order of the loops probably doesn't matter.

rcgldr · Jun 21, 2020

The wiki article covers this and notes which languages are row major or column major:

https://en.wikipedia.org/wiki/Row-_and_column-major_order

In the case of hardware, when a fixed sized matrix, such as a n row by 512 byte column matrix, needs to be efficient for both row and column ordering the, memory addressing is setup so that both row and column major order end up being column access order for the ram.

FactChecker · Jun 21, 2020

Optimization techniques can cause unexpected timing results. I once thought that I was being smart by traversing an array of data first one way, then backward, and alternating after that. My idea was that the most recent data used would still be in memory when the algorithm turned around. It ran slower. Optimizing techniques include "look-ahead" retrieval of data. I recommend that you conduct some timing tests of alternative approaches if speed is an important issue. And if speed is important, select the highest optimization level that you can.

jedishrfu · Jun 22, 2020

the idea of neighboring values speeding your computation is due to the instruction cache used by some processors anticipating the locations you will next access.

willem2 · Jun 22, 2020

jedishrfu said:

the idea of neighboring values speeding your computation is due to the instruction cache used by some processors anticipating the locations you will next access.

It's really the data cache. The main effect is that data is moved from main memory to the caches in cache lines.
A common size of a cache line on 64 bit processors is 64 bytes.
If you do not read from consecutive locations, you might be using only 1 double precision number (8 bytes) per cache line that you read (64 bytes). Accessing large blocks of memory could be up to 8 times slower.

If you have only a small array that fits in the smallest data cache, it might not matter however.
A 32k level one cache can contain a 64x64 array of double precision values.

If you do more calculations with each array value, the extra delay might also disappear. There is a data prefetcher that can recognize regular access patters of data, and load your block into the cache before you need it.

.Scott · Jun 30, 2020

Another potential factor is optimization performed by the compiler.

If you process the elements in the order they occur in memory, the compiler is more likely to recognize that you are treating the array as one-dimensional. The result could be the saving of some of the array indexing arithmetic and the availability of one more register.

But this would be a very minor.

Jarvis323 · Jul 19, 2020

In also can affect wether the complier can use vector instructions (e.g. SSE).

Does the order in which you run two 'for' loops matter?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Sweetspot of data compression

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect