XOR algorithm to find the missing number in a given array

In summary: I don't know. It looks like that line is doing something with the array indexes, but I don't understand what it is.The algorithm assigns the value at index position i in the array to the variable x1, and the value at index position (i+1) in the array to x2.
  • #1
Taylor_1989
402
14
TL;DR Summary
I am having an issue understanding how the given algorithm, is actually working.
Recently I have been reading up about the logical operations OR, AND and XOR. Whilst reading about XOR I cam across a method using XOR to find the missing number in a given array, but I just can't understand how or really why it works. The algorithm is displayed below.

Find missing number in array:
A = [2, 3, 1, 4, 6, 7]

x1 = A[0]
x2 = 1

for i in range(1, len(A)):
    x1 = x1 ^ A[i]
   
for i in range(2, len(A)+2):
    x2 = x2 ^ i

print('--'*10)

print(x1 ^ x2)

What I can seem to understand is two things.
1. Why for the second loop has the array length been extend by 2?

2. In effect from what see is that the algorithm is comparing the last elements within each for loop, but I just understand how this gives the missing number? As I understand XOR compare the binary number of the given integer and if both integers are the same will result in 0 or if not the same will result in a new integer i.e 16 ^ 15 = 1. But why compare the array and update x1 and why compare x2 with i and update.

I have runt the print out of each line of the itterartion to see if I could gain some insight to what going on but to be honest confused me more. Could maybe some explain in laymen terms if possible.
 
Technology news on Phys.org
  • #2
The code is specific to the length of A and the values in A and assumes the values in A are sequential and go from 1 to 7, but is missing one value, so the length of A is 6 instead of 7. x1 is initialized to A[0], then the rest of A is xor'ed to x1 in the first loop. x2 is initialized to 1, the smallest value in A, then the values 2 through 7 are xor'ed to x2. After both loops, x1 ^ x2 will equal the missing number.

A more general version of this. mn is the minimum value in A.

Code:
A = [2, 3, 1, 4, 6, 7]

x1 = 0
x2 = 0
mn = min(A)

for i in range(0, len(A)):
    x1 = x1 ^ A[i]
    x2 = x2 ^ (i+mn)
x2 ^= len(A)+mn

print(x1 ^ x2)
 
Last edited:
  • Like
Likes Taylor_1989
  • #3
Allow me to add that the main point of this algorithm is to observe that if you XOR the same number twice it comes out zero. This also extends to a sequence of numbers where each number is somewhere in the list twice. This then means that if exactly one number where missing in such a list of double numbers then the total XOR would come out as that missing number.

Regarding code one can usually come a long way in understanding by using some descriptive names. For instance, below you do not see the +2 you were puzzled about because its been split into a max_value calculation and a +1 for the range argument (since the second argument to range is the stop value, not included in the range result):

Python:
actual_values = [2, 3, 1, 4, 6, 7]

min_value = min(actual_values)
max_value = min_value + len(actual_values)
desired_values = range(min_value, max_value + 1)

missing = 0
for x in actual_values:
    missing ^= x
for x in desired_values:
    missing ^= x

print(missing)
 
  • Like
  • Informative
Likes anorlunda and Taylor_1989
  • #4
Just for info, there's a common functional idiom for accumulating a value from a sequence. Python is one of the languages that allows it:
Python:
from functools import reduce    # Only needed in Python 3.

A = [2, 3, 1, 4, 6, 7]

xor = lambda x, y: x^y    # xor(x, y) returns x^y.

x1 = reduce(xor, A)
x2 = reduce(xor, range(1, len(A)+2))

print(x1^x2)    # Prints 5.
Of course, a lot of people still prefer plain loops for this sort of thing.
 
Last edited:
  • Like
Likes Taylor_1989
  • #5
This algorithm is an interesting trick. Say that X a sequence of any number of different numbers, and A is that same sequence except one (in this case, A is missing c}.

X = {a,b,c,d,e}
A = {a,b,d,e}

How this algorithm works? First it xors all values in A into a variable x; that's what the first loop is doing.

x = a ^ b ^ d ^ e

Then the second loop xors all values in X in the same variable, that is what the second loop is doing:

x = ( a ^ b ^ d ^ e ) ^ ( a ^ b ^ c ^ d ^ e )

then, rearranging:

x = ( a ^ a ) ^ ( b ^ b ) ^ ( c ) ^ ( d ^ d ) ^ ( e ^ e )

then, whatever w, w ^ w = 0, and w ^ 0 = w; therefore:

x = 0 ^ 0 ^ c ^ 0 ^ 0 = c = missing value

Per the question, why it's going to len(A)+2, that's because len(A) is 6, so range[2,8] = { 2, 3, 4, 5, 6, 7 }; now, why did it exclude 1? that's because it already included 1 from the very first assignment; so, by including 1, you're xoring with { 1, 2, 3, 4, 5, 6, 7 }, which is the list of all possible values from 1 to 7.

By the way, you don't really need to use xors for this; + and would work the same way as

x = ( sum of all values in X ) - ( sum of all values in A ) = missing value

or, in code:

A = [2, 3, 1, 4, 6, 7] x1 = A[0] for i in range(1, len(A)) begin x1 = x1 + A[i] end x2 = 1 for i in range(2, len(A)+2) begin x2 = x2 + i end print( x2 - x1 )
 
  • Like
Likes Taylor_1989
  • #6
Hmm... actually one could also use * to do the same thing...

x = ( multiplication of all values in X ) / ( multiplication of all values in A ) = missing value

I wonder what's the theory behind this... say you have two operators, op1 and op2... then if this holds true:

a == ( a op1 b ) op2 b

then the same trick works with these operators; that's why it works with xor/xor, +/- and */division... but which other operators would also work for this algorithm... interesting question; I guess rotation operators would also work, because they are associative and have an inverse. But then I'm no mathematician.
 
  • #7
Not a mathematician myself, but it seems the trick works if you have some subset ##S## of an abelian group (of numbers or some other type of object), you're given a subset ##S'## of ##S## with one element missing, and the problem is to identify that missing element.

Being a group means each object has an inverse (itself for xor and ##-x## and ##1/x## for addition and multiplication) and the group operation is associative (it doesn't matter where you put parentheses). Abelian means the group operation is also commutative (i.e., ##x \ast y = y \ast x##). Commutativity and associativity seem important so that you can move the elements around into any order and get them to cancel out.

Rotations in a 2D plane satisfy these conditions (a bit trivially, since it reduces to simple addition of the rotation angles modulo ##2 \pi## radians). Rotations in 3 or more dimensions don't since they're generally not commutative.
 
Last edited:
  • #8
A-ha! Abelian groups! That's the word, thank you!

I stand corrected on the 3D rotations - thank you!
 
  • #9
Just a small comment that using plus/minus or similar in a a practical implementation quickly will bring on the issue of overflow.

The characteristic of xor being both symmetric and closed on its data type (i.e. no overflow) is very useful in both theoretic algorithms and practical implementations, and its use is quite common in fields like cryptography.
 
  • #10
hmm.. good point... let's see if that holds true on additions with modulus; say modulus 10

ie, say I'm a complement-10 cpu that can only store numbers from 0 to 9, and I have

X= { 1, 2, 3, 4, 5, 6, 7, 8 }
A = { 1, 2, 3, 5, 6, 7, 8 }

so A is missing 4; then

sum of X mod 10 = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 6
sum of A mod 10 = 1 + 2 + 3 + 5 + 6 + 7 + 8 = 2

so in this case ( 6 - 2 ) mod 10 = 4, which is correct

but say the missing value was 7, then

sum of X mod 10 = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 6
sum of A mod 10 = 1 + 2 + 3 + 4 + 5 + 6 + 8 = 9

then ( 6 - 9 ) mod 10 = -3, but -3 does not exist in this CPU (it only holds 0..9), so it would underflow and give the complement-10 of it, which is 10-3 = 7

so I suspect that algorithm would still work (if one disables overflow/underflow) for integer numbers in a CPU that uses complement-N (say complement-2, which everybody uses, or complement-10 for my hypothetical cpu above); but I suspect you're right, that it would not work for cpus that use other representations, like BCD, signal/magnitude or complement-1, nor it would work for any of the common floating point representations (as they are usually in signal/magnitude, plus there's the pesky problem of the NaN overflow thingie). I suspect the xor algorithm would not work for any of those either.
 
  • #11
Filip Larsen said:
Just a small comment that using plus/minus or similar in a a practical implementation quickly will bring on the issue of overflow.

This depends on the language. Python integers are not machine register integers; their size is only limited by available memory. Still, this just substitutes one possible problem for others (efficiency, and in very very extreme cases maybe exhaustion of memory) that xor doesn't have.

On the other hand, bit fiddling with xor for maximum efficiency somehow seems a bit out of place in a high level language like Python.
fbs7 said:
hmm.. good point... let's see if that holds true on additions with modulus; say modulus 10

Well in general, the set ##\{0, \dotsc, N-1\}## with addition modulo ##N## is an abelian group, so yes, you could take all the additions modulo some number ##N## to keep the magnitudes of intermediate results under control and it would still work, as long as you're sure ##N## is larger than the numbers in the set you're dealing with.

The set ##\{1, \dotsc, N-1\}## with multiplication modulo ##N## is also an abelian group if ##N## is a prime number: https://en.wikipedia.org/wiki/Group_(mathematics)#Modular_arithmetic. In that case, for every integer in the set there's another integer that acts as its inverse.
 
  • #12
fbs7 said:
( 6 - 9 ) mod 10 = -3
In math, the sign of modulo is the same as the sign of the dividend (unless the remainder == 0), so (6-9) mod 10 = 7. This is different than the remainder with computers and/or programming languages where the sign of remainder is the same as the sign of the dividend. This won't be an issue with unsigned integers (which are always positive) either. Say the sum of the array is a 32 bit unsigned integer 0xFFFFFFFC, and the sum of the sequential values is 0x00000003, then 0x00000003 - 0xFFFFFFFC = 0x00000007.
 
  • #13
rcgldr said:
In math, the sign of modulo is the same as the sign of the dividend (unless the remainder == 0), so (6-9) mod 10 = 7. This is different than the remainder with computers and/or programming languages where the sign of remainder is the same as the sign of the dividend. This won't be an issue with unsigned integers (which are always positive) either. Say the sum of the array is a 32 bit unsigned integer 0xFFFFFFFC, and the sum of the sequential values is 0x00000003, then 0x00000003 - 0xFFFFFFFC = 0x00000007.

Ah, got it - thank you!
 
  • #14
wle said:
On the other hand, bit fiddling with xor for maximum efficiency somehow seems a bit out of place in a high level language like Python.
But calculating checksums with XOR goes back before Python, or any computer capable of running a high level language, existed. You can implement an 8-bit XOR with many fewer relays (or transistors) than an adder, it can run much faster, and is more flexible (you can send a 7 bit message over an 8 bit channel and the XOR checksum will be the same).
 

FAQ: XOR algorithm to find the missing number in a given array

What is the XOR algorithm?

The XOR algorithm (Exclusive OR) is a bitwise operation that compares two binary numbers and returns a 1 if they are different and 0 if they are the same.

How does the XOR algorithm find the missing number in a given array?

The XOR algorithm takes advantage of the fact that XOR operations are commutative and associative, meaning that the order of operations does not matter. By XOR-ing all the numbers in the array with each other, the duplicates will "cancel out" and the missing number will remain.

Can the XOR algorithm only be used for finding missing numbers in arrays?

No, the XOR algorithm can also be used for other purposes such as checking for duplicates in an array or for encrypting data.

What is the time complexity of the XOR algorithm for finding a missing number in a given array?

The time complexity of the XOR algorithm for finding a missing number in a given array is O(n), where n is the length of the array. This is because the algorithm only requires one pass through the array to find the missing number.

Is the XOR algorithm suitable for large arrays?

Yes, the XOR algorithm is suitable for large arrays as it has a time complexity of O(n) and does not require any additional space. This makes it a more efficient solution compared to other algorithms for finding missing numbers in arrays.

Back
Top