Testing Randomness in a Set of 200+ Data Points

Click For Summary
SUMMARY

This discussion focuses on testing the randomness of a dataset consisting of 200 positive integers, generated annually over ten years. Key statistical methods recommended include calculating the mean and standard deviation to identify outliers and assessing the distribution type, particularly the Gaussian distribution. The importance of defining what constitutes "abnormally high" values is emphasized, with suggestions to use thresholds such as three standard deviations above the mean or 1.5 times the interquartile range (IQR). Visualization techniques like time series graphs and boxplots are also advised for further analysis.

PREREQUISITES
  • Understanding of basic statistics, including mean and standard deviation
  • Familiarity with Gaussian distribution and its properties
  • Knowledge of outlier detection methods, such as IQR and standard deviation thresholds
  • Experience with data visualization techniques, including boxplots and time series graphs
NEXT STEPS
  • Learn how to compute and interpret the mean and standard deviation in datasets
  • Study Gaussian distribution characteristics and how to test for normality
  • Explore outlier detection techniques using IQR and standard deviation methods
  • Investigate data visualization tools for creating boxplots and time series graphs
USEFUL FOR

Data analysts, statisticians, and researchers interested in evaluating the randomness of datasets and identifying outliers in time series data.

qspeechc
Messages
839
Reaction score
15
Hi everyone.

It's been years since I've done any stats, so I need a bit of help, please. I want to include it in a blog post I'm going to do (not here on PF), so I don't want to give away too many details :p I apologise for my terrible understanding of stats, please be patient!

Anyway, over ten years I have 20 data points for each year, i.e. 200 in total, which are positive integers. In practice they are never higher than 2000, although conceivably they could be. The assumption is that each number is generated randomly.

1) How do I test if a given data point is too large to be random, given that the other numbers tend to be smaller?

2) A 'source' produces one data point a year, how can I test if this source is producing abnormally high numbers over the ten years?

Thank you for any help.
 
Physics news on Phys.org
For both cases I would compute the mean and standard deviation of the distribution of your 200 data points.
1) You can check if they follow a typical distribution (most notably the Gaussian distribution). If yes, everything that you would not expect given this distribution might be some real effect. You cannot be sure without a clear model, but you can get a good idea with that method.
2) Check the mean and expected deviation of this mean, see if it is compatible with the first distribution.
 
  • Like
Likes   Reactions: qspeechc
If all the values are integers they can't be "normally distributed", even if they are symmetric. (The normal distribution is just a convenient description of patters often found in data anyway, no data is truly normal.) But even if they are symmetric, the process outlined above would indicate only that a value is an "outlier" - that doesn't disqualify it as being non-random, simply identifies it as unusual in size.

As a first step you need to specify what qualifies as "abnormally high" (do you have a specific cutoff for that? If not, then saying something like any value more than 3 standard deviations above the mean, or more than 1.5IQR above the third quartile, is needed). Once this is done you might
* graph the data over time and look to see which years, if any, have unusally large values
* look at a plot of each year's data (boxplot?) to check just that group

But again, first making a more specific description of what you mean is where you need to begin.
 

Similar threads

  • · Replies 15 ·
Replies
15
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 174 ·
6
Replies
174
Views
12K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
5
Views
2K