Faulty Shuffles? The random shuffle

  • Thread starter anorlunda
  • Start date
  • Tags
    Random
In summary: To get a truly random selection you would need to show every picture at least once.In summary, a random playlist has songs that are randomly picked from the collection.
  • #1
anorlunda
Staff Emeritus
Insights Author
11,308
8,732
When I play music, or when I look at a slideshow, I like to use the random selection. I have about 1000 songs, and 25000 pictures in my libraries. Taking pictures for example, if I view 100 pictures per day, then it should be 25000/100=250 days, on the average, before I see a repetition of the same picture. What I actually see are certain items repeating 2-3 times in the first 100 day after day.

In fact, it seems like the random shuffle picks a deck of 64 or so items, then randomly selects from that short list. Even harder to explain, even though I restart my apps daily, a few pictures (or songs) appear in the list almost every day, so that there is an apparent bias in the choice of the 64.

By my definition, a random shuffle of N items should look like this on taken from Knuth's Art of Computer Programming. https://en.wikipedia.org/wiki/Fisher–Yates_shuffle

The problem seems the same using software from Microsoft, Google, Apple, Samsung, and Motorola, and even the AM/FM radio in my boat, and the problem has been around for at least 10 years. Is there a popular open source library for doing random shuffles, that is both ubiquitous to manufacturers, and and so faulty in its design? Or do I misunderstand the meaning of "random" in a playlist?
 
Technology news on Phys.org
  • #2
anorlunda said:
By my definition, a random shuffle of N items should look like this on taken from Knuth's Art of Computer Programming. https://en.wikipedia.org/wiki/Fisher–Yates_shuffle.
That's your definition. (A lot of other people have the same concept.) It's not very "random". A "random" playlist is actually the antithesis of "random". A truly random selection would mean that every song played (or every picture shown) is randomly picking from your collection.

Here's an example of "random". I switched jobs about seven months ago, to a small company with about 25 employees at the time. It took me three months to find the job; the switch happened to coincide with my birthday. Even odder, my birthday happened to be exactly the same as that of one of the company's founders. How odd is that? The answer is, not very. The odds are better than 50-50 that two people share the same birthday in a group of 25 people. I just happened to be the one whose birthday coincided with someone else's.

This is the birthday problem. In a group of 23 people, the odds are just slightly over 50-50 that two people will share the same birthday. In a collection of 1000 songs, you should expect to hear some song replayed within 38 songs.
 
  • #3
Thanks for the repy D H. I'm sure that you're right that what I expect linguisticly is different than the software, but the birthday problem alone does not match the statistics.

First, let's be more precise with language. Think of poker. A random shuffle shuffle's all 52 cards, after which a hand of any size can be dealt. A random draw picks one card at random, then puts the card back. A two card hand from a shuffled deck has zero probability of having the same card twice. A two card hand by random draw has a 1/52 probability of having the same card twice. Imagine the chaos if a poker hand had five Ace of Spades.

Now, the birthday problem. If there are N songs, and D plays per session, the probability of repeated plays is
 
  • #4
Yes. That is annoying. I finally made a Perl program which wrote a temporary playlist that was truly random and sent it to Microsoft Windows Media Player. I haven't tried the Media Player 'random" for a long time, so maybe it is better now. But from your comment, I guess it has not improved.

Edit: My program is random without replacement. Each song is on the list only once.
 
Last edited:
  • #5
D H said:
That's your definition. (A lot of other people have the same concept.) It's not very "random". A "random" playlist is actually the antithesis of "random". A truly random selection would mean that every song played (or every picture shown) is randomly picking from your collection.
There is random with replacement and there is random without replacement. Both are legitimate uses of the term. Although I never did a statistical study of it, the behavior of Windows Media Player didn't seemed like either of those. I think the OP is looking for random without replacement.
 
  • #6
Randomness with replacement explains the observations I think. In the first 100 days 10'000 pictures were shown, on average every picture got picked 0.4 times. To a very good approximation we have a Poisson distribution, so we expect:
6700 pictures that have been shown exactly 1 time
1250 pictures that have been shown exactly 2 times
180 pictures that have been shown exactly 3 times
18 pictures: that have been shown exactly 4 times
One or two pictures that have been shown 5 times
 
  • Like
Likes Timo and FactChecker
  • #7
anorlunda said:
When I play music, or when I look at a slideshow, I like to use the random selection. I have about 1000 songs, and 25000 pictures in my libraries. Taking pictures for example, if I view 100 pictures per day, then it should be 25000/100=250 days, on the average, before I see a repetition of the same picture. What I actually see are certain items repeating 2-3 times in the first 100 day after day.

In fact, it seems like the random shuffle picks a deck of 64 or so items, then randomly selects from that short list. Even harder to explain, even though I restart my apps daily, a few pictures (or songs) appear in the list almost every day, so that there is an apparent bias in the choice of the 64.

By my definition, a random shuffle of N items should look like this on taken from Knuth's Art of Computer Programming. https://en.wikipedia.org/wiki/Fisher–Yates_shuffle

The problem seems the same using software from Microsoft, Google, Apple, Samsung, and Motorola, and even the AM/FM radio in my boat, and the problem has been around for at least 10 years. Is there a popular open source library for doing random shuffles, that is both ubiquitous to manufacturers, and and so faulty in its design? Or do I misunderstand the meaning of "random" in a playlist?

Yeah, I have had this too... Specifically with phones seemingly favoring songs I bought from their stores, over mp3s I loaded on manually. I would imagine as a programmer, if nothing else, there may often be a learning process to attempt to note songs the user might like, (based on what you've skipped or listened to in the past, favorited, or recently purchased) that shapes the random curve. Truly random algorithms will give you strange repetitions, but they won't repeat, and they won't favor a certain set of items long term.
 
  • #8
Seems they would simply use a pseudo random number generator to index into the list. So what happens with random () % listLength. It also depends on how often they seed the generator. Seed it with the date when the player starts?

No doubt though that "random" can seem less random than one would intuitively expect. Humans are designed to detect patterns.

So, if there are 365 songs there is 50/50 there is a repeat withinin any 23 selections if it is really random(and nearly 99.9% in any 70). So, human that you are, you WILL notice.
 
  • Like
Likes Fooality
  • #9
meBigGuy said:
Seems they would simply use a pseudo random number generator to index into the list. So what happens with random () % listLength. It also depends on how often they seed the generator. Seed it with the date when the player starts?

No doubt though that "random" can seem less random than one would intuitively expect. Humans are designed to detect patterns.

So, if there are 365 songs there is 50/50 there is a repeat withinin any 23 selections if it is really random(and nearly 99.9% in any 70). So, human that you are, you WILL notice.

That's what I was wondering with this thread after I posted: Are the new songs jumping out because they are truly played more often, or because they are something I note more when they are played?
 
  • #10
I know that the "shuffle" function in Spotify isn't truly random. The reason is simply that most people don't actually like truly random shuffles since it could e.g. mean the same song being played twice in a row. Our instinctive idea for what is random is very different from real randomness, i.e. he "problem" here is that we are all susceptible to Gambler's fallacy.
Hence, Spotify -and I presume many other media players- uses some sort of "nearly-random" algorithm which takes human psychology into account.

https://labs.spotify.com/2014/02/28/how-to-shuffle-songs/
 
  • #11
f95toli said:
I know that the "shuffle" function in Spotify isn't truly random. The reason is simply that most people don't actually like truly random shuffles since it could e.g. mean the same song being played twice in a row. Our instinctive idea for what is random is very different from real randomness, i.e. he "problem" here is that we are all susceptible to Gambler's fallacy.
You misunderstand the word shuffle. See post #3. Suppose I shuffle a deck of cards imperfectly; a pseudo-random shuffle. When I deal them, there is still no repetition of cards all the way to the bottom of the deck. You are thinking of random draw (see post #3) not shuffle.

It is not gambler's fallacy to expect that a poker hand can not have two aces of spades. No. When we deal cards from a deck, there is zero chance of repeated cards. That has nothing to do with gambler's fallacy.


meBigGuy said:
So, if there are 365 songs there is 50/50 there is a repeat withinin any 23 selections if it is really random(and nearly 99.9% in any 70). So, human that you are, you WILL notice.
D H also mentioned the birthday problem.

But using Wikipedia's approximate formula for the birthday problem [p(n,d)=1-e^(-n^2/2d)], if I have 25,000 pictures in my library, and if I select a sample of 122 pictures, the probability of getting one duplicate is 50%. https://en.wikipedia.org/wiki/Birthday_problem#Approximations

But I am getting 4-5 duplicates per hundred. That is 5-6 times more duplicates than the birthday problem predicts.

What I want is a simple shuffle, like shuffling a deck of cards. By definition, a shuffle has zero repeats until we reach the bottom of the deck. That is true regardless of the quality of the shuffle.

But is it simpler to code a random draw than a shuffle. A random draw needs no memory of the past, and no storage proportional to deck size.

But a random draw has a property worse than repeats. How long must you wait until all song/pictures in the library have been played at least once? That probability is always <1 even with an infinitely long wait time.
 
  • #12
anorlunda said:
But is it simpler to code a random draw than a shuffle. A random draw needs no memory of the past, and no storage proportional to deck size.
There are algorithms for a pseudo-random shuffle that need constant memory independent of the input size (well, logarithmic, as you have to store the last location or step number).

anorlunda said:
But I am getting 4-5 duplicates per hundred.
You memorize the pictures in chunks of 100?

anorlunda said:
How long must you wait until all song/pictures in the library have been played at least once? That probability is always <1 even with an infinitely long wait time.
The expectation value is (approximately) N ln(N) or 250,000 songs. After 300,000 pictures there is a 15% chance left that one pictures didn't get selected, after 350,000 this drops to 2%.
 
  • #13
In the vernacular of APL, random with replacement is called roll, random without replacement is called deal. I don't know if roll and deal terminology is used outside of APL.
 
  • #14
Random in programming, as in a "good" PRNG the usual definition - mayb the use of drand48() or even GNU rand(): the longterm distribution of results is flat, i.e. for a "random" sample of n=1..10; sampled 1000 times, the expected resultset will have 100 ones, 100 twos, etc. This means, in practical terms, that for every sample you have an even chance of getting any of the numbers 1...10, and the previous number you got has ZERO influence on the next choice. So you can get repetitions.

What the anorlunda seems to want on a simple level is an algorithm for dealing cards. The sample sequence is random, but all entries eventually get returned, with no duplications. Every sample DOES affect the subsequent sample. Because the sample you got is removed from the constellation of possible choices. On a more advanced level, the removed and then excluded sample for cryptographically good PRNG's is called a 'nonce'.

https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator
 
  • #15
anorlunda said:
You misunderstand the word shuffle. See post #3. Suppose I shuffle a deck of cards imperfectly; a pseudo-random shuffle. When I deal them, there is still no repetition of cards all the way to the bottom of the deck. You are thinking of random draw (see post #3) not shuffle.

It is not gambler's fallacy to expect that a poker hand can not have two aces of spades. No. When we deal cards from a deck, there is zero chance of repeated cards. That has nothing to do with gambler's fallacy.

I understood what you meant. However, I used a bad example (song) when describing what I meant with Gamber's fallacy in this case. What I should have written is "song, artist or even genre". If you read the blog post I linked to you will see that Spotify (and I presume most other media players) try to "shuffle" not just the songs but also the artists etc. Hence, it follows that different song will be assigned different probabilities: if you have 10 songs with artist A and 100 songs with artist B it could potentially follow (depending on algorithm) that the songs from artist A will be played more often than those of artist B in order to avoid having several songs by artist B being played consecutively. I suspect this is one reason why they sometimes get it wrong.
Also, note that the algorithms the media players (or at least Spotify) use do their best to actually fulfill Gambler's fallacy: they try to give us what we expect, not what is mathematically correct.

I do believe there are media players out there that just "shuffle" the songs, I am almost certain the my Squeezebox at home does a "proper" shuffle.
 
  • #16
The word "shuffle" implies changing the order of a list. It does not imply that items in the list can appear more than once.
 

1. What is a faulty shuffle?

A faulty shuffle refers to a random shuffling process that does not produce a truly random distribution of elements. This can be caused by various factors, such as human error or algorithmic flaws.

2. How do faulty shuffles affect data analysis?

Faulty shuffles can significantly impact data analysis results by introducing bias or skewness into the data. This can lead to incorrect conclusions and inaccurate predictions.

3. What are some common causes of faulty shuffles?

Some common causes of faulty shuffles include inadequate mixing or shuffling techniques, biased selection of elements, or flaws in the shuffling algorithm itself.

4. How can we prevent faulty shuffles?

To prevent faulty shuffles, it is important to use proper shuffling techniques, such as the Fisher-Yates shuffle, which ensures a truly random distribution. It is also important to regularly check and test shuffling algorithms for any potential biases or flaws.

5. Are there any real-life examples of faulty shuffles?

Yes, faulty shuffles have been observed in various scenarios, such as in casinos where faulty shuffling techniques have been used to cheat players, or in DNA sequencing where faulty shuffles can lead to incorrect genetic analysis results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • General Discussion
Replies
2
Views
663
  • Programming and Computer Science
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Computing and Technology
Replies
30
Views
2K
  • General Discussion
Replies
22
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
9
Views
3K
  • Computing and Technology
Replies
9
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
23
Views
3K
Back
Top