Faulty Shuffles? The random shuffle

1. Aug 21, 2015

Staff: Mentor

When I play music, or when I look at a slideshow, I like to use the random selection. I have about 1000 songs, and 25000 pictures in my libraries. Taking pictures for example, if I view 100 pictures per day, then it should be 25000/100=250 days, on the average, before I see a repetition of the same picture. What I actually see are certain items repeating 2-3 times in the first 100 day after day.

In fact, it seems like the random shuffle picks a deck of 64 or so items, then randomly selects from that short list. Even harder to explain, even though I restart my apps daily, a few pictures (or songs) appear in the list almost every day, so that there is an apparent bias in the choice of the 64.

By my definition, a random shuffle of N items should look like this on taken from Knuth's Art of Computer Programming. https://en.wikipedia.org/wiki/Fisher–Yates_shuffle

The problem seems the same using software from Microsoft, Google, Apple, Samsung, and Motorola, and even the AM/FM radio in my boat, and the problem has been around for at least 10 years. Is there a popular open source library for doing random shuffles, that is both ubiquitous to manufacturers, and and so faulty in its design? Or do I misunderstand the meaning of "random" in a playlist?

2. Aug 22, 2015

D H

Staff Emeritus
That's your definition. (A lot of other people have the same concept.) It's not very "random". A "random" playlist is actually the antithesis of "random". A truly random selection would mean that every song played (or every picture shown) is randomly picking from your collection.

Here's an example of "random". I switched jobs about seven months ago, to a small company with about 25 employees at the time. It took me three months to find the job; the switch happened to coincide with my birthday. Even odder, my birthday happened to be exactly the same as that of one of the company's founders. How odd is that? The answer is, not very. The odds are better than 50-50 that two people share the same birthday in a group of 25 people. I just happened to be the one whose birthday coincided with someone else's.

This is the birthday problem. In a group of 23 people, the odds are just slightly over 50-50 that two people will share the same birthday. In a collection of 1000 songs, you should expect to hear some song replayed within 38 songs.

3. Aug 22, 2015

Staff: Mentor

Thanks for the repy D H. I'm sure that you're right that what I expect linguisticly is different than the software, but the birthday problem alone does not match the statistics.

First, let's be more precise with language. Think of poker. A random shuffle shuffle's all 52 cards, after which a hand of any size can be dealt. A random draw picks one card at random, then puts the card back. A two card hand from a shuffled deck has zero probability of having the same card twice. A two card hand by random draw has a 1/52 probability of having the same card twice. Imagine the chaos if a poker hand had five Ace of Spades.

Now, the birthday problem. If there are N songs, and D plays per session, the probability of repeated plays is

4. Aug 22, 2015

FactChecker

Yes. That is annoying. I finally made a Perl program which wrote a temporary playlist that was truly random and sent it to Microsoft Windows Media Player. I haven't tried the Media Player 'random" for a long time, so maybe it is better now. But from your comment, I guess it has not improved.

Edit: My program is random without replacement. Each song is on the list only once.

Last edited: Aug 22, 2015
5. Aug 22, 2015

FactChecker

There is random with replacement and there is random without replacement. Both are legitimate uses of the term. Although I never did a statistical study of it, the behavior of Windows Media Player didn't seemed like either of those. I think the OP is looking for random without replacement.

6. Aug 22, 2015

Staff: Mentor

Randomness with replacement explains the observations I think. In the first 100 days 10'000 pictures were shown, on average every picture got picked 0.4 times. To a very good approximation we have a Poisson distribution, so we expect:
6700 pictures that have been shown exactly 1 time
1250 pictures that have been shown exactly 2 times
180 pictures that have been shown exactly 3 times
18 pictures: that have been shown exactly 4 times
One or two pictures that have been shown 5 times

7. Aug 25, 2015

Fooality

Yeah, I have had this too... Specifically with phones seemingly favoring songs I bought from their stores, over mp3s I loaded on manually. I would imagine as a programmer, if nothing else, there may often be a learning process to attempt to note songs the user might like, (based on what you've skipped or listened to in the past, favorited, or recently purchased) that shapes the random curve. Truly random algorithms will give you strange repetitions, but they won't repeat, and they won't favor a certain set of items long term.

8. Aug 29, 2015

meBigGuy

Seems they would simply use a pseudo random number generator to index into the list. So what happens with random () % listLength. It also depends on how often they seed the generator. Seed it with the date when the player starts?

No doubt though that "random" can seem less random than one would intuitively expect. Humans are designed to detect patterns.

So, if there are 365 songs there is 50/50 there is a repeat withinin any 23 selections if it is really random(and nearly 99.9% in any 70). So, human that you are, you WILL notice.

9. Aug 31, 2015

Fooality

That's what I was wondering with this thread after I posted: Are the new songs jumping out because they are truly played more often, or because they are something I note more when they are played?

10. Sep 3, 2015

f95toli

I know that the "shuffle" function in Spotify isn't truly random. The reason is simply that most people don't actually like truly random shuffles since it could e.g. mean the same song being played twice in a row. Our instinctive idea for what is random is very different from real randomness, i.e. he "problem" here is that we are all susceptible to Gambler's fallacy.
Hence, Spotify -and I presume many other media players- uses some sort of "nearly-random" algorithm which takes human psychology into account.

https://labs.spotify.com/2014/02/28/how-to-shuffle-songs/

11. Sep 3, 2015

Staff: Mentor

You misunderstand the word shuffle. See post #3. Suppose I shuffle a deck of cards imperfectly; a pseudo-random shuffle. When I deal them, there is still no repetition of cards all the way to the bottom of the deck. You are thinking of random draw (see post #3) not shuffle.

It is not gambler's fallacy to expect that a poker hand can not have two aces of spades. No. When we deal cards from a deck, there is zero chance of repeated cards. That has nothing to do with gambler's fallacy.

D H also mentioned the birthday problem.

But using Wikipedia's approximate formula for the birthday problem [p(n,d)=1-e^(-n^2/2d)], if I have 25,000 pictures in my library, and if I select a sample of 122 pictures, the probability of getting one duplicate is 50%. https://en.wikipedia.org/wiki/Birthday_problem#Approximations

But I am getting 4-5 duplicates per hundred. That is 5-6 times more duplicates than the birthday problem predicts.

What I want is a simple shuffle, like shuffling a deck of cards. By definition, a shuffle has zero repeats until we reach the bottom of the deck. That is true regardless of the quality of the shuffle.

But is it simpler to code a random draw than a shuffle. A random draw needs no memory of the past, and no storage proportional to deck size.

But a random draw has a property worse than repeats. How long must you wait until all song/pictures in the library have been played at least once? That probability is always <1 even with an infinitely long wait time.

12. Sep 3, 2015

Staff: Mentor

There are algorithms for a pseudo-random shuffle that need constant memory independent of the input size (well, logarithmic, as you have to store the last location or step number).

You memorize the pictures in chunks of 100?

The expectation value is (approximately) N ln(N) or 250,000 songs. After 300,000 pictures there is a 15% chance left that one pictures didn't get selected, after 350,000 this drops to 2%.

13. Sep 3, 2015

rcgldr

In the vernacular of APL, random with replacement is called roll, random without replacement is called deal. I don't know if roll and deal terminology is used outside of APL.

14. Sep 3, 2015

Staff: Mentor

Random in programming, as in a "good" PRNG the usual definition - mayb the use of drand48() or even GNU rand(): the longterm distribution of results is flat, i.e. for a "random" sample of n=1..10; sampled 1000 times, the expected resultset will have 100 ones, 100 twos, etc. This means, in practical terms, that for every sample you have an even chance of getting any of the numbers 1...10, and the previous number you got has ZERO influence on the next choice. So you can get repetitions.

What the anorlunda seems to want on a simple level is an algorithm for dealing cards. The sample sequence is random, but all entries eventually get returned, with no duplications. Every sample DOES affect the subsequent sample. Because the sample you got is removed from the constellation of possible choices. On a more advanced level, the removed and then excluded sample for cryptographically good PRNG's is called a 'nonce'.

https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator

15. Sep 4, 2015

f95toli

I understood what you meant. However, I used a bad example (song) when describing what I meant with Gamber's fallacy in this case. What I should have written is "song, artist or even genre". If you read the blog post I linked to you will see that Spotify (and I presume most other media players) try to "shuffle" not just the songs but also the artists etc. Hence, it follows that different song will be assigned different probabilities: if you have 10 songs with artist A and 100 songs with artist B it could potentially follow (depending on algorithm) that the songs from artist A will be played more often than those of artist B in order to avoid having several songs by artist B being played consecutively. I suspect this is one reason why they sometimes get it wrong.
Also, note that the algorithms the media players (or at least Spotify) use do their best to actually fulfill Gambler's fallacy: they try to give us what we expect, not what is mathematically correct.

I do believe there are media players out there that just "shuffle" the songs, I am almost certain the my Squeezebox at home does a "proper" shuffle.

16. Sep 4, 2015

FactChecker

The word "shuffle" implies changing the order of a list. It does not imply that items in the list can appear more than once.