C++ swap routines: pointers vs. references

In summary, @sysprog posted a code for a swap function that uses XOR in another thread in this section. The function ensures that it does not swap a location with itself by using three XORs instead of one. The conversation then moves onto discussing swap routines using references and pointers, and how the underlying assembly code for both versions is identical despite the differences in calling semantics. One drawback of using references is that you cannot tell if the function being called is using references without looking at the prototype or actual function. It is also mentioned that creating a swap function may make it harder for the compiler to optimize the code, as an optimized swap can often be done with a single XCHG instruction or even just relabeling registers.
  • #1
37,626
9,862
@sysprog posted this code for a nifty swap function that uses XOR in another thread in this section.
sysprog said:
C:
void XorSwap( int* x, int* y )
{
/* ensure that you're not swapping a location with itself 
(3 XORs would zero it out as surely as 1 XOR would) */
if (x != y)
  {
    *x ^= *y; /* xor x with y and store the result at x */
    *y ^= *x; /* xor y with x and store the result at y */
    *x ^= *y; /* xor x with y and store the result at x */
/* x and y have been swapped without a 3rd memory location */
  }
}
This put me in mind of a couple of swap routines that I did awhile back: one with references and one with pointers. Unlike the example above, both routines do use a third memory location. Rather than wander off-topic, I thought I'd just start a new thread.

Here's the references version:
C++:
// Swap two numbers, using reference parameters
// x and y are references to the a and b variables in main()
void swapRefs(int& x, int& y)
{
    int temp;
    temp = x;
    x = y;
    y = temp;
}
This function is called like this:
C++:
int a = 5;
int b = 8;
swapRefs(a, b);
After swapRefs() returns, a's value will be 8 and b's value will be 5.

Here's the pointers version:
C++:
// Swap two numbers, using pointer parameters
// x and y are pointers to the a and b variables in main()
void swapPtrs(int* x, int* y)
{
    int temp;
    temp = *x;
    *x = *y;
    *y = temp;
}
This function is called like so:
C++:
int a = 5;
int b = 8;
swapPtrs(&a, &b);
After swapPtrs() returns, a's value will be 8 and b's value will be 5, which is what you would expect.

So far, this shouldn't be too surprising, and may even be a bit ho-hum.
What's very interesting to me in a very nerdy way is that, even though the calling semantics of the two swap functions and what happens in the function bodies are quite different, the underlying assembly code (in the MSFT Visual Studio implementation) is identical.

swapRefs() assembly code - the lines of C++ code are shown in comments.
Code:
//    int temp;
//    temp = x;
mov         eax,dword ptr [x]  
mov         ecx,dword ptr [eax]  
mov         dword ptr [temp],ecx  
//    x = y;
mov         eax,dword ptr [x]   
mov         ecx,dword ptr [y]  
mov         edx,dword ptr [ecx]  
mov         dword ptr [eax],edx  
//    y = temp;
mov         eax,dword ptr [y]  
mov         ecx,dword ptr [temp]  
mov         dword ptr [eax],ecx

swapRefs() assembly code
Code:
//    int temp;
//    temp = *x;
mov         eax,dword ptr [x]  
mov         ecx,dword ptr [eax]  
mov         dword ptr [temp],ecx  
//    *x = *y;
mov         eax,dword ptr [x]  
mov         ecx,dword ptr [y]  
mov         edx,dword ptr [ecx]  
mov         dword ptr [eax],edx  
//    *y = temp;
mov         eax,dword ptr [y]  
mov         ecx,dword ptr [temp]  
mov         dword ptr [eax],ecx
 
  • Like
  • Informative
Likes Chris Miller, jim mcnamara, jtbell and 2 others
Technology news on Phys.org
  • #2
Nice observation and commentary @Mark44.
 
  • #3
I've always assumed that references are generally implemented the same way as pointers (memory locations that contain the addresses of other memory locations), and I've probably seen statements to that effect in various places over the years, but I think this is the first time I've seen actual assembly language code demonstrating it.
 
  • Like
Likes sysprog
  • #4
One issue with using references is that looking at a call, you can't tell if the function being called is using references or not, without having to look for a prototype or the actual function to see if it is using references.
 
  • #5
rcgldr said:
One issue with using references is that looking at a call, you can't tell if the function being called is using references or not, without having to look for a prototype or the actual function to see if it is using references.
Why is that an issue? Even a very good disassembler can't reproduce the original high-level language source code. I think that if you have the original source code then it's not an issue, and that if you have only the object code then it doesn't matter what technique was used in the high-level language code that was compiled to produce it, as at that point you're working with the machine language.
 
  • #6
Not a good idea to create a swap function, it makes it much harder for the compiler to optimise the code: in many cases an optimised swap will simply involve a single XCHG instruction. 3 register-to-register instructions.

In some cases a swap may not take any instructions but simply a relabelling of the registers by the compiler!
 
Last edited:
  • #7
pbuk said:
Not a good idea to create a swap function, it makes it much harder for the compiler to optimise the code: in many cases an optimised swap will simply involve 3 register-to-register instructions.
If you do the swap with inline code, rather than calling a function, it takes instructions to copy the value in variables to registers. Here's the generated assembly code (VS) for the usual technique of swapping the values in two variables. As before, the C/C++ code is in // comments. Also, each assembly line has comments detailing what is happening.
Code:
//    int temp;
//    temp = b;
008E298E  mov         eax,dword ptr [b]               ; copy b to EAX
008E2991  mov         dword ptr [temp],eax       ; copy b to temp
//    b = a;
008E2994  mov         eax,dword ptr [a]              ; copy a to EAX
008E2997  mov         dword ptr [b],eax              ; copy a to b
//    a = temp;
008E299A  mov         eax,dword ptr [temp]      ; copy temp to EAX
008E299D  mov         dword ptr [a],eax             ; copy temp to a
Are you suggesting that some compilers can optimize away some of this code? Such as by the use of XOR, as @sysprog showed?
 
  • #8
Mark44 said:
If you do the swap with inline code, rather than calling a function, it takes instructions to copy the value in variables to registers.
In many cases at least one of the values will already be in a register, and at least one of the swapped results will be used again before the registers are used for another purpose. To see this in practice you need to look at an real example of a tight-loop construct (which is where optimisation makes a difference).
 
  • #9
rcgldr said:
One issue with using references is that looking at a call, you can't tell if the function being called is using references or not, without having to look for a prototype or the actual function to see if it is using references.
sysprog said:
Why is that an issue? Even a very good disassembler can't reproduce the original high-level language source code. I think that if you have the original source code then it's not an issue, and that if you have only the object code then it doesn't matter what technique was used in the high-level language code that was compiled to produce it, as at that point you're working with the machine language.
That is not what I meant. The issue is having to look at two different locations in source code to see if a function is pass by reference (references) or pass by value. C++ should have used a different syntax for the calling code, so that it would be clear in the calling code that a call is using references on one or more parameters.
 
  • #10
Mark44 said:
Are you suggesting that some compilers can optimize away some of this code? Such as by the use of XOR, as @sysprog showed?
A compiler will at least be able to produce:
mov register1, [memory1]
mov register2, [memory2]
mov [memory1], register2
mov [memory2], register1
If you turn on all optimizations with visual studio , it doesn't matter if you use a function for swap. The compiler will eliminate the function anyway.

I tried it in a bubble sort program, and the swap consisted of only 2 stores, since the 2 numbers had just been loaded into registers to compare them
Code:
            if (array[i] > array[i + 1]) {
mov         eax,dword ptr [ecx*4+0C94394h]   // eax = array[i+1]
mov         edx,dword ptr array (0C94390h)[ecx*4]  //edx = array[i]
cmp         edx,eax 
jle         main+92h (0C91092h)    // skip the swap if edx <= eax 
                Swap(array[i], array[i + 1]);
mov         dword ptr array (0C94390h)[ecx*4],eax  //array [i] = eax 
mov         dword ptr [ecx*4+0C94394h],edx   // array[i+1] = edx

The Xor method is horrible. You still need two loads and two stores to accomplish the swap, and you only add extra code with computations that depend on each other and can't be done in parallel.
The compiler should be able to see that it is equivalent to the other method, but I can't get it to optimize it away.
 
  • Like
Likes pbuk
  • #11
willem2 said:
The Xor method is horrible. You still need two loads and two stores to accomplish the swap, and you only add extra code with computations that depend on each other and can't be done in parallel.
Each of the 3 Xor instructions that accomplish the swap does a fetch and a store.

This is from the IBM System 370 Principles of Operation (1975) manual:
The sequence A EXCLUSIVE-ORed B, B EXCLUSIVE-ORed A, A EXCLUSIVE-ORed B results in the exchange of the contents of A and B without the use of an auxiliary buffer area.​

and:
The execution of XI and XC consists in fetching a first-operand byte from main storage and subsequently storing the updated value. These fetch and store accesses to a particular byte do not necessarily occur one immediately after the other. Thus, the instruction EXCLUSIVE OR cannot be safely used to update a shared location in main storage if the possibility exists that another CPU or a channel may also be updating the location. For XI, only one byte is stored.​

I think that it's not "horrible" ##-## we used it to conserve memory back when that mattered even more than it does now ##-## and we were fully aware of the multitasking and multiprocessing considerations ##\dots##
 
Last edited:
  • #12
sysprog said:
I think that it's not "horrible" − we used it to conserve memory back when that mattered even more than it does now − and we were fully aware of the multitasking and multiprocessing considerations
I wasn't thinking about multitasking/processing, but about a single processor (or processor core) executing more than one instruction at the same time (superscalar execution)
This can happen with two load/store pairs of the non-xor swap instruction. They completely independent. This can not happen with the xor solution, because all xor's but the first depend on earlier xor's. The first xor depends on two memory loads, and can't start until they are both done.
 
  • #13
In today's world, both CPU speed and memory seem to be abundant. However, there can still be cases where saving memory is more important than saving time. My point is simply that we can't make a blanket statement saying that optimization must always favor speed over space.
 
  • Like
Likes pbuk, sysprog and Vanadium 50
  • #14
willem2 said:
I wasn't thinking about multitasking/processing, but about a single processor (or processor core) executing more than one instruction at the same time (superscalar execution)
This can happen with two load/store pairs of the non-xor swap instruction. They completely independent. This can not happen with the xor solution, because all xor's but the first depend on earlier xor's. The first xor depends on two memory loads, and can't start until they are both done.
How are they "completely independent"? If you're using a swap method that requires a 3rd location, e.g. [copy A to C, copy B to A, copy C to B] then how does being able to run more than one instruction in a single clock cycle allow you to fetch from and store to a single location without having to do that sequentially? Don't you need to store A to C before you overwrite A from B? And don't you need to have already stored to C from A before you use the value from C to overwrite B?

According to the Intel manual https://www.intel.com/content/dam/w...4-ia-32-architectures-optimization-manual.pdf

The execution core of the Intel Core microarchitecture is superscalar and can process instructions out of order. When a dependency chain causes the machine to wait for a resource (such as a second-level data cache line), the execution core executes other instructions. This increases the overall rate of instructions executed per cycle (IPC).​

That's great, but I think that it cannot relieve us of the sequentiality of the swap process, whether we use a 3rd memory location or not.
 
Last edited:
  • #15
pbuk said:
Not a good idea to create a swap function, it makes it much harder for the compiler to optimise the code: in many cases an optimised swap will simply involve a single XCHG instruction. 3 register-to-register instructions.

In some cases a swap may not take any instructions but simply a relabelling of the registers by the compiler!
I'm curious regarding both of those statements ##-## it seems to me that swapping values between 2 memory locations always requires 3 elements, whether they are logic gates, or microinstructions sequenced within an instruction, or normal instructions ##-## regarding the XCHG instruction:

from http://www.c-jump.com/CIS77/ASM/DataTypes/T77_0200_exchanging_integers.htm:

You can exchange data between registers or between registers and memory, but not from memory to memory.​

##-## they reference register pairs ##-## it looks to me like you could use it for part of a tag sort, but not for a plain bubble sort for records longer than a register pair, and as you presumably know, even with a tag sort, you have to read each whole record once, and write each whole record once, but to do the sort, you get to swap just the tags ##\dots##

On IBM mainframes, we have the Compare and Swap instruction, the microcode for which is proprietary, but inferable from its functions, which are documented:

https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzahw/rzahwrzahwcascasco.htm
 
  • #16
willem2 said:
The Xor method is horrible. You still need two loads and two stores to accomplish the swap, and you only add extra code with computations that depend on each other and can't be done in parallel.
The compiler should be able to see that it is equivalent to the other method, but I can't get it to optimize it away.
The XOR variant is not equivalent to the regular swap. Consider what happens if the pointers are identical...
 
  • #17
glappkaeft said:
The XOR variant is not equivalent to the regular swap. Consider what happens if the pointers are identical...
Did you look at the C code in the quote in post #1 in this thread? ##-## it checks for that via if (x != y) ##-## so that the XORs happen only if the values are not equal ##-## there's no need to swap them if they're equal ##\dots##
 
  • #18
sysprog said:
Did you look at the C code in the quote in post #1 in this thread? − it checks for that via if (x != y) − so that the XORs happen only if the values are not equal − there's no need to swap them if they're equal …

Yes I did. The reason you check for the pointers not being equal is not because you don't need to swap them in that case but that the algorithm fails (it zeros both pointers).

Relevant to my reply is that I don't expect an optimizing compiler to dare optimize the XOR version by replacing it with a swap version or be smart enough to do so. Partially because pointer assignment and XOR are very different but also this corner case with aliasing.

BTW: I really like the XOR swap but using has been discouraged for a long time. Doing it with a tmp pointer is something the compiler can understand and optimize (MOV-elimination, OoO is possible, etc). It is not uncommon that "clever" optimizations results in slower code due to the compiler not being able to understand what is safe to do. On a modern computer the pointer swap should use the same amount of memory (edit: registers) and be faster than the XOR swap.
 
  • #19
glappkaeft said:
Yes I did. The reason you check for the pointers not being equal is not because you don't need to swap them in that case but that the algorithm fails (it zeros both pointers).
I think that the fact that you don't need to swap them if they're equal is why you can get away with not trying to do so if they're equal, and so you're not consequently failing, and you're successfully swapping the 2 values without the 3rd memory location, every time they're unequal and you want to swap them.

from https://en.wikipedia.org/wiki/XOR_swap_algorithm
However, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". This is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location, in which case their values must already be equal. That is, if x and y use the same storage location, then the line:​
X := X XOR Y​
sets x to zero (because x = y so X XOR Y is zero) and sets y to zero (since it uses the same storage location), causing x and y to lose their original values.​

I cited the instance of equality only as being a sufficient justification for not trying to do a swap in the first place; I think that it's likely that anyone who understands step-by-step how the 3-XOR swap works, will also recognize that XORing something against itself will zero it out.
Relevant to my reply is that I don't expect an optimizing compiler to dare optimize the XOR version by replacing it with a swap version or be smart enough to do so. Partially because pointer assignment and XOR are very different but also this corner case with aliasing.
IBM uses PL/S III for its system code and that compiler calls the assembler and thereby allows putting assembly language directly in-line with the otherwise-PL/1-like code, and that means that the programmer can more easily do his own post-auto-optimization 'manual' optimization. IBM doesn't (generally) provide PL/S III to its customers (there have been 'unofficial' versions cobbled together using the PL/1 compiler and the assembler), so when we're writing system code, and we want the capabilities of the PL/1 Optimizing Compiler (it's a procedural and imperative language), and still want to do assembly language things, we do something like DCL SUB01 ENTRY EXTERNAL OPTIONS(ASM,INTER,NOMAP);, then separately assemble the subroutine, and then resolve the resulting subroutine address location in the compiled code via the linkage editor ##-## that's less convenient than the PL/S III capability to auto-call the assembler and not need to resort to a linked external subroutine but it still works.
BTW: I really like the XOR swap but using has been discouraged for a long time. Doing it with a tmp pointer is something the compiler can understand and optimize (MOV-elimination, OoO is possible, etc). It is not uncommon that "clever" optimizations results in slower code due to the compiler not being able to understand what is safe to do. On a modern computer the pointer swap should use the same amount of memory (edit: registers) and be faster than the XOR swap.
I've seen it frequently in IBM mainframe assembly language and PL/S III code (mostly from pre-'Object Code Only' days) and in machine language when I'm reading dumps ##-## only infrequently in Intel or Motorola or C code. The speed of the 3-XOR swap compared to that of the 3rd memory location swap depends in part on the specifics of the machine architecture.
 
Last edited:
  • #20
Mark44 said:
@sysprog posted this code for a nifty swap function that uses XOR in another thread in this section.

This put me in mind of a couple of swap routines that I did awhile back: one with references and one with pointers. Unlike the example above, both routines do use a third memory location. Rather than wander off-topic, I thought I'd just start a new thread.

Here's the references version:
C++:
// Swap two numbers, using reference parameters
// x and y are references to the a and b variables in main()
void swapRefs(int& x, int& y)
{
    int temp;
    temp = x;
    x = y;
    y = temp;
}
This function is called like this:
C++:
int a = 5;
int b = 8;
swapRefs(a, b);
After swapRefs() returns, a's value will be 8 and b's value will be 5.

Here's the pointers version:
C++:
// Swap two numbers, using pointer parameters
// x and y are pointers to the a and b variables in main()
void swapPtrs(int* x, int* y)
{
    int temp;
    temp = *x;
    *x = *y;
    *y = temp;
}
This function is called like so:
C++:
int a = 5;
int b = 8;
swapPtrs(&a, &b);
After swapPtrs() returns, a's value will be 8 and b's value will be 5, which is what you would expect.

So far, this shouldn't be too surprising, and may even be a bit ho-hum.
What's very interesting to me in a very nerdy way is that, even though the calling semantics of the two swap functions and what happens in the function bodies are quite different, the underlying assembly code (in the MSFT Visual Studio implementation) is identical.

swapRefs() assembly code - the lines of C++ code are shown in comments.
Code:
//    int temp;
//    temp = x;
mov         eax,dword ptr [x]
mov         ecx,dword ptr [eax]
mov         dword ptr [temp],ecx
//    x = y;
mov         eax,dword ptr [x] 
mov         ecx,dword ptr [y]
mov         edx,dword ptr [ecx]
mov         dword ptr [eax],edx
//    y = temp;
mov         eax,dword ptr [y]
mov         ecx,dword ptr [temp]
mov         dword ptr [eax],ecx

swapRefs() assembly code
Code:
//    int temp;
//    temp = *x;
mov         eax,dword ptr [x]
mov         ecx,dword ptr [eax]
mov         dword ptr [temp],ecx
//    *x = *y;
mov         eax,dword ptr [x]
mov         ecx,dword ptr [y]
mov         edx,dword ptr [ecx]
mov         dword ptr [eax],edx
//    *y = temp;
mov         eax,dword ptr [y]
mov         ecx,dword ptr [temp]
mov         dword ptr [eax],ecx
The 3 XOR instruction swap code is analogous to a 3 XOR gate planar crossover hardware function, while the non-planar crossover use of the z axis is analogous to using a 3rd memory location, and of course you can use 4 NANDs to take the place of an XOR:

1632674819539.png
 

1. What is the difference between using pointers and references in C++ swap routines?

Pointers and references are both ways to pass variables to functions in C++. However, pointers and references have some key differences. Pointers hold the address of a variable, while references act as an alias for a variable. This means that changes made to a reference will affect the original variable, while changes made to a pointer will not automatically affect the original variable. In C++ swap routines, using pointers allows for swapping the values stored at the addresses pointed to by the pointers, while using references allows for swapping the values of the referenced variables.

2. Which is more efficient, using pointers or references in C++ swap routines?

In terms of efficiency, using references is typically faster than using pointers. This is because references are implemented as pointers in the background, so there is no extra overhead of creating a new variable for the address. Additionally, references are guaranteed to be non-null, while pointers can be null, which can lead to errors if not checked properly. Therefore, it is generally recommended to use references instead of pointers for C++ swap routines.

3. Can you use both pointers and references in the same C++ swap routine?

Yes, it is possible to use both pointers and references in the same C++ swap routine. This can be useful if you need to swap the values of two variables using pointers, but also want to track the original addresses of the variables using references. However, it is important to be careful when using both pointers and references in the same function to avoid any potential errors.

4. Are there any situations where using pointers may be more beneficial than using references in C++ swap routines?

There are certain situations where using pointers may be more beneficial than using references in C++ swap routines. For example, if you need to pass a null value to the swap function, using a pointer would be the only option since references cannot be null. Additionally, if you need to change the address of a variable within the swap routine, using pointers would be necessary.

5. Are there any potential risks or drawbacks to using pointers and references in C++ swap routines?

One potential risk of using pointers and references in C++ swap routines is the possibility of causing memory leaks or dangling references. This can happen if the pointers or references are not properly managed and point to invalid memory locations. Additionally, using pointers and references can make code more complex and harder to understand, so it is important to use them carefully and only when necessary.

Similar threads

Replies
63
Views
3K
  • Programming and Computer Science
Replies
7
Views
4K
  • Programming and Computer Science
Replies
4
Views
3K
  • Programming and Computer Science
Replies
1
Views
1K
  • Programming and Computer Science
Replies
12
Views
1K
  • Programming and Computer Science
Replies
4
Views
2K
  • Programming and Computer Science
Replies
11
Views
4K
  • Engineering and Comp Sci Homework Help
Replies
12
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
4
Views
2K
  • Programming and Computer Science
Replies
30
Views
4K
Back
Top