Algorithm for checking if a set of digits are all distinct

  • Context: High School 
  • Thread starter Thread starter etotheipi
  • Start date Start date
  • Tags Tags
    Algorithm Set
Click For Summary

Discussion Overview

The discussion revolves around finding an efficient algorithm to determine if a set of 9 digits contains all distinct digits from 1 to 9. Participants explore various methods, including mathematical checks, programming scripts, and bit manipulation techniques.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • Some participants suggest checking the sum of the digits (45) and the product (362,880) as necessary conditions for distinctness, but express uncertainty about their sufficiency.
  • One participant mentions a brute-force approach and notes exceptions found in specific digit sets.
  • Another participant proposes checking the prime factorization of the product to identify repeated digits based on divisibility rules.
  • Several participants discuss the efficiency of their scripts, comparing execution times and noting differences in complexity.
  • A method using bit manipulation is introduced, where a 16-bit integer is used to track the presence of digits, with a comparison to 511 to check for distinctness.
  • Some participants express concerns about the reliability of their algorithms, particularly in identifying duplicates correctly.
  • There is a discussion about the performance of different implementations, with some participants claiming their methods are faster or more efficient than others.
  • One participant highlights a bug in another's code that leads to incorrect conclusions about digit repetition.

Areas of Agreement / Disagreement

Participants generally agree on the need for an efficient method to check for distinct digits but present multiple competing approaches and express uncertainty about the effectiveness of their proposed solutions. Disagreements arise regarding the correctness and efficiency of specific algorithms.

Contextual Notes

Some methods rely on assumptions about the input data, such as the digits being integers. There are unresolved issues regarding the sufficiency of mathematical conditions for distinctness and the accuracy of certain algorithms in identifying duplicates.

Who May Find This Useful

Readers interested in algorithm design, programming efficiency, and mathematical problem-solving in the context of digit manipulation may find this discussion relevant.

etotheipi
If I give you 9 digits ##u_1, u_2, \dots, u_9##, is there an operation/set of operations you can perform to check whether all the digits from 1 to 9 are represented in that set? Just asking because my solution was a boring brute force check.

I don't think anything useful becomes of the product ##u_1 u_2 \dots u_9## since, for instance, there exist other sets of 9 digits that multiply to ##9!## (e.g. 653346274).

I wondered whether anyone knew if a nicer method is possible?
 
Last edited by a moderator:
  • Like
Likes   Reactions: JD_PM and member 587159
Mathematics news on Phys.org
etotheipi said:
If I give you 9 digits ##u_1, u_2, \dots, u_9##, is there an operation/set of operations you can perform to check whether all the digits from 1 to 9 are represented in that set? Just asking because my solution was a boring brute force check.

I don't think anything useful becomes of the product ##u_1 u_2 \dots u_9## since, for instance, there exist other sets of 9 digits that multiply to ##9!## (e.g. 653346274).

I wondered whether anyone knew if a nicer method is possible?
If the 9 digits are distinct, their sum will be 45. If their product also is 362,880 (9!), that would suggest that they are distinct. The conditions that their sum is 45 and their product is 362,880 are necessary, but I haven't convinced myself that they are sufficient conditions.
 
  • Like
Likes   Reactions: etotheipi and member 587159
Mark44 said:
but I haven't convinced myself that they are sufficient conditions.

This can be brute-forced by a computer (didn't check myself, but it sounds plausible).
 
  • Like
Likes   Reactions: etotheipi
I just wrote quick script to test this and it turns out there is one exception. The other one it threw up was 124445799.
Code:
found = ["123456789"]

for i in range(100000000,1000000000):
    num = i
    product = 1
    sum = 0

    while(i>=1):
        product *= i%10
        sum += i%10
        i//=10

    if(product == 362880 and sum == 45):
        ascending = "".join(sorted(str(num)))
        new = True
        for j in found:
            if(ascending == j):
                new = False
        if(new):
            found.append(ascending)

print(found)
OUTPUT:
['123456789', '124445799']
 
Last edited by a moderator:
  • Like
Likes   Reactions: member 587159
Here's another check: 9! = ##2^7 \cdot 3^4 \cdot 5 \cdot 7##
If the product is divisible by any higher powers of the four factors above, at least one digit is repeated. In other words, the product can't be evenly divisible by ##2^8## or ##3^5## or ##5^2## or ##7^2##. And I'm assuming that the sum of the 9 digits is 45.
 
etotheipi said:
I just wrote quick script to test this and it turns out there are a few exceptions. The first one it threw up was
124445799.

Code:
for i in range(100000000,1000000000):
    num = i
    product = 1
    sum = 0

    while(i>=1):
        product *= i%10
        sum += i%10
        i//=10

    if(product == 362880 and sum == 45):
        ascending = "".join(sorted(str(num)))
        if(ascending != "123456789"):
            print(num)
Why are you checking numbers in the range 100,000,000 to 999,999,999? If the product is larger than 362,880 (9!), then there have to be some repeated digits.

I think I understand your reasoning -- If all 9 digits are 9, the product is 387,420,489, which is within your range, but that's brute force with a bludgeon.
 
  • Like
Likes   Reactions: Janosh89 and etotheipi
Mark44 said:
I think I understand your reasoning -- If all 9 digits are 9, the product is 387,420,489, which is within your range, but that's brute force with a bludgeon.

Whoops you're quite right, I wasn't thinking too much about that part 😂, a sledgehammer approach indeed. Luckily modern computing power makes fairly short work of my sloppiness anyway!
 
  • Haha
Likes   Reactions: member 587159
Start off with a 16 bit integer set to 0
For each digit, set the corresponding bit
Compare the integer with 511
 
  • Like
Likes   Reactions: etotheipi
Merlin3189 said:
Start off with a 16 bit integer set to 0
For each digit, set the corresponding bit
Compare the integer with 511

This is fairly similar to my initial approach but your method is much snappier; I just scanned through the result and checked for no zeroes but you're right it would be quicker to just compare with ##2^9 -1##. I think this is probably the best solution, since a number theory approach is perhaps overly complicated. Thanks!
 
  • #10
etotheipi said:
If I give you 9 digits ##u_1, u_2, \dots, u_9##, is there an operation/set of operations you can perform to check whether all the digits from 1 to 9 are represented in that set?
{...snip...}
I wondered whether anyone knew if a nicer method is possible?
If the solution does not have to include math operations, then
if a digit repeats, then not all digits from 1 to 9 are represented.

create a temporary set ##temp[1:9]##; zero ##temp##​
for each ##u_i##​
if (##u_i ∈ temp##) then return (false)​
let ##temp_i = u_i##​
end for​
return (true)​

For brevity, I left out the details for checking if digits repeat.
{Edit: removed check for zero since already zeroed ##temp##.}
 
  • Like
Likes   Reactions: etotheipi
  • #11
Setting digits is the quickest option. It's also one that comes with minimal code:
Code:
sum=0
for k in u:
  sum+=1<<k
return sum==1022
This is valid python code. It assumes "u" is a set of integers, it doesn't check that. If they might not be integers this needs to be verified as well.
 
  • Love
Likes   Reactions: etotheipi
  • #12
Here's an implementation of what I mentioned earlier.
Python:
from time import perf_counter
t1 = perf_counter()
count = 0
list = [1, 2, 3, 5, 5, 5, 7, 8,9]
sum = 0
prod = 1
for i in range (0, 9):
    sum += list[i]
    prod *= list[i]
if (sum == 45):
    if (prod % 256 == 0):
        print ("Duplicate digit! Too many 2s")
    elif (prod % 81 == 0):
        print ("Duplicate digit! Too many 3s")
    elif (prod % 25 == 0):
        print ("Duplicate digit! Too many 5s")
    elif (prod % 49 == 0):
        print ("Duplicate digit! Too many 7s")
    else:
        print("No duplicate digits...")
else:
    print("Sum is incorrect")
t2 = perf_counter()
print("t2: ", t2, " t1: ", t1)
print("Elapsed time: ", (t2-t1), " seconds")
@etothepi, I tried out your version, and it took longer to run than I wanted to wait -- maybe about 5 minutes before I CTRL-C-ed out.
The version above runs in 0.0002833 seconds, or just under 3 milliseconds, and prints the message that there are too many 5s.
Edit: just under 0.3 msec.
 
Last edited:
  • Like
Likes   Reactions: etotheipi
  • #13
Mark44 said:
The version above runs in 0.0002833 seconds, or just under 3 milliseconds, and prints the message that there are too many 5s.
0.3 milliseconds?
I ran my function a million times and it finished in 2-3 seconds. The runtime doesn't depend on the input, so I didn't vary that.

As discussed before, checking the powers of the primes isn't a guarantee that every digit occurs. 124445799 has the same digit sum and digit product as 123456789. Your 25 line code takes 100 times as long as the 4 line code and has a bug.

etotheipi's script was designed to look for a bug in your script. It's not a surprise that it takes longer because it does a vastly more complicated task than checking if every digit occurs once in a set of 9 digits.
 
  • Like
Likes   Reactions: etotheipi
  • #14
Mark44 said:
@etothepi, I tried out your version, and it took longer to run than I wanted to wait -- maybe about 5 minutes before I CTRL-C-ed out.
The version above runs in 0.0002833 seconds, or just under 3 milliseconds, and prints the message that there are too many 5s.
Although I guess this is mainly because you are checking one list whilst my one was checking about 900000000 of them :wink:
mfb said:
Setting digits is the quickest option. It's also one that comes with minimal code:
This is valid python code. It assumes "u" is a set of integers, it doesn't check that. If they might not be integers this needs to be verified as well.

This looks like magic... very neat!
 
  • Like
Likes   Reactions: mfb
  • #15
It's possible to make that code more pythonic and less portable, of course:
Code:
return sum(1<<k for k in u)==1022
 
  • Informative
Likes   Reactions: etotheipi
  • #16
mfb said:
As discussed before, checking the powers of the primes isn't a guarantee that every digit occurs. 124445799 has the same digit sum and digit product as 123456789. Your 25 line code takes 100 times as long as the 4 line code and has a bug.
My code catches 124445799, saying that it has too many 3s.

I ran my code 1,000,000 times as well, but commented out the line where it prints "too many 3's," because I/O is inherently slow. I don't dispute that your algorithm is faster, but running my code a million times took only 7.6 sec, which is less than 3 times as slow, not 100 times as slow. Your claim that my code takes 100 times as long doesn't take into account that your code was running from cache most of the time, whereas my one time through wasn't.

And what's the bug in my code? It successfully noted that 124445799 has at least one duplicate digit.
 
  • #17
Mark44 said:
My code catches 124445799, saying that it has too many 3s.
Uh, it's worse than I expected. Another bug was hiding the more fundamental flaw of the program logic. It also complains about too many 3s for correct sets. You would need to check for 243, not 81. But if you do that then 124445799 will pass. Of course it does: It has the same sum and product as the correct set.
 
  • #18
I'm lazier than that:
PHP:
<?php
echo strlen($u) == strlen(count_chars($u, 3));
?>
Or:
Python:
u = str(u);
print len(u) == len(set(u));
As a bonus, it works with any other set of characters - not just digits - of any length!
 
  • Like
  • Love
Likes   Reactions: Merlin3189 and etotheipi
  • #19
I admit I miscounted the exponent on 3, and that even with that correction, my algorithm misclassifies 124445799. I'll take a harder look at my algorithm.
 
  • #20
Fixed the miscount on the exponent on 3. This code checks that the sum of the squares of the digits is correct. It produces the correct answer for 123456789 and 124445799. Are there any that it miscategorizes?
Python:
from time import perf_counter
t1 = perf_counter()
count = 0
list = [1, 2, 4, 4, 4, 5, 7, 9, 9]
sum = 0
prod = 1
sum_sq = 0
for i in range (0, 9):
sum += list[i]
prod *= list[i]
   sum_sq += list[i]*list[i]
if (sum != 45):
   print("Sum is incorrect")
elif (prod != 362880):
   print("Product is incorrect")
elif (sum_sq != 285):
   print("Sum of squares is incorrect")
else:
   if (prod % 256 == 0):
      print ("Duplicate digit! More than 7 factors of 2")
   elif (prod % 243 == 0):
      print ("Duplicate digit! More than 4 factors of 3")
   elif (prod % 25 == 0):
      print ("Duplicate digit! More than 2 factors of 5")
   elif (prod % 49 == 0):
      print ("Duplicate digit! More than 2 factors of 7")
   else:
      print("No duplicate digits...")
t2 = perf_counter()
print("t2: ", t2, " t1: ", t1)
print("Elapsed time: ", (t2-t1), " seconds")
 
  • #21
mfb said:
It's possible to make that code more pythonic and less portable, of course:
Code:
return sum(1<<k for k in u)==1022
When you're setting flag bits, I would call this more assembly-like. Although it's not intuitively obvious at first glance, yours is a very elegant solution.
Here's a more-or-less equivalent version in ARMv7:
[CODE title="ARMv7"]@ Constants
.eqv PR_STR, 0x69
.eqv HALT, 11
.eqv LIST_SIZE, 9
.eqv STD_OUT, 1

.data
List: .word 1, 2, 4, 4, 4, 5, 7, 9, 9
MagicNo: .word 0x3FE @ Bit pattern 11 1111 1110, or decimal 1022
NoDupe: .asciz "No duplicate digits"
Dupe: .asciz "Duplicate digits"

.text
main:
mov r0, #0 @ Accumulator
ldr r1, =List
mov r2, #LIST_SIZE @ Loop counter
mov r3, #1
ldr r4, =MagicNo
ldr r4, [r4]

Loop:
ldr r5, [r1] @ Get a number from List
lsl r5, r3, r5 @ r5 <- 1 << r5
add r0, r0, r5 @ Add to sum
add r1, r1, #4 @ Increment address in List
subs r2, r2, #1 @ Decrement loop counter
bne Loop

subs r2, r0, r4 @ r2 <- sum - MagicNo
beq NoDupeDigit @ If equal, no duplicate digits
mov r0, #STD_OUT @ Otherwise, there are duplicates
ldr r1, =Dupe
swi PR_STR
b Done

NoDupeDigit:
ldr r1, =NoDupe
swi PR_STR
Done:
swi HALT[/CODE]
 
  • #22
I would do some basic checks, sort the digits in ascending order and compare the result to "123456789".
 
  • Like
Likes   Reactions: etotheipi
  • #23
Mark44 said:
Are there any that it miscategorizes?
124445799 is the only case where you get the same sum and product as 123456789, so once you catch that there is nothing left that is misclassified. Note that checking for the product is sufficient, you don't need to check divisibility any more. If the product is 362880 then your divisibility checks have to pass.
I still don't see the point of the approach. It takes longer, it is harder to understand, harder to generalize, and most importantly it relies on an advance check of every single set of 9 digits to make sure there are no further counterexamples because there is no theorem guaranteeing the uniqueness of properties you check.
 
  • #24
I would accumulate a frequency array of all the digits. Freq( "digit" ) += 1.
I would verify that all digits had a frequency of one. Maybe check the product of all frequencies = 1.
 
  • #25
Baluncore said:
I would accumulate a frequency array of all the digits. Freq( "digit" ) += 1.
I would verify that all digits had a frequency of one. Maybe check the product of all frequencies = 1.

Assuming you did initialise all frequencies to 0, if you use = rather than +=, then each frequency ends up as 0 or 1, so you can check by adding rather than multiplying, which used to be faster, even for integers.

And it's probably faster than my method using bits rather than words to store the frequencies.
 
  • #26
Merlin3189 said:
Assuming you did initialise all frequencies to 0, ...
How could it be frequency, or quickly done otherwise?
Merlin3189 said:
... if you use = rather than +=, then each frequency ends up as 0 or 1, ...
But then it is presence and not frequency, and you must assume the correct number of digits were input.
Multiplication was slower than addition once, but now it is faster than indexing and fetching data.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 14 ·
Replies
14
Views
5K
Replies
29
Views
6K
  • · Replies 19 ·
Replies
19
Views
4K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K
Replies
19
Views
4K
  • · Replies 42 ·
2
Replies
42
Views
8K