Testing Randomness in a Set of 200+ Data Points

Click For Summary
To test the randomness of a set of 200 positive integers generated over ten years, compute the mean and standard deviation to establish a baseline for typical values. Determine if the data follows a Gaussian distribution, as deviations from this can indicate non-randomness. Identify what constitutes "abnormally high" values, such as those exceeding three standard deviations above the mean. Visualizing the data through graphs or boxplots can help identify unusual trends or outliers in specific years. Establishing clear criteria for abnormality is essential for accurate analysis.
qspeechc
Messages
839
Reaction score
15
Hi everyone.

It's been years since I've done any stats, so I need a bit of help, please. I want to include it in a blog post I'm going to do (not here on PF), so I don't want to give away too many details :p I apologise for my terrible understanding of stats, please be patient!

Anyway, over ten years I have 20 data points for each year, i.e. 200 in total, which are positive integers. In practice they are never higher than 2000, although conceivably they could be. The assumption is that each number is generated randomly.

1) How do I test if a given data point is too large to be random, given that the other numbers tend to be smaller?

2) A 'source' produces one data point a year, how can I test if this source is producing abnormally high numbers over the ten years?

Thank you for any help.
 
Mathematics news on Phys.org
For both cases I would compute the mean and standard deviation of the distribution of your 200 data points.
1) You can check if they follow a typical distribution (most notably the Gaussian distribution). If yes, everything that you would not expect given this distribution might be some real effect. You cannot be sure without a clear model, but you can get a good idea with that method.
2) Check the mean and expected deviation of this mean, see if it is compatible with the first distribution.
 
  • Like
Likes qspeechc
If all the values are integers they can't be "normally distributed", even if they are symmetric. (The normal distribution is just a convenient description of patters often found in data anyway, no data is truly normal.) But even if they are symmetric, the process outlined above would indicate only that a value is an "outlier" - that doesn't disqualify it as being non-random, simply identifies it as unusual in size.

As a first step you need to specify what qualifies as "abnormally high" (do you have a specific cutoff for that? If not, then saying something like any value more than 3 standard deviations above the mean, or more than 1.5IQR above the third quartile, is needed). Once this is done you might
* graph the data over time and look to see which years, if any, have unusally large values
* look at a plot of each year's data (boxplot?) to check just that group

But again, first making a more specific description of what you mean is where you need to begin.
 
Here is a little puzzle from the book 100 Geometric Games by Pierre Berloquin. The side of a small square is one meter long and the side of a larger square one and a half meters long. One vertex of the large square is at the center of the small square. The side of the large square cuts two sides of the small square into one- third parts and two-thirds parts. What is the area where the squares overlap?

Similar threads

Replies
5
Views
2K
  • · Replies 11 ·
Replies
11
Views
2K
Replies
17
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 8 ·
Replies
8
Views
930
  • · Replies 11 ·
Replies
11
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K