Is this a waveform or random data?

MickN · Aug 27, 2013

Hi there,

I have recently come across some data that is supposed to be random, but I don't think it is. I graphed it out and it sure doesn't look random. I also ran a couple of statistical tests such as the runs test, and they all say "not random." Visually, the data looks like a piece of a complex waveform (I've done a lot of work with sound synthesis, and the data looks like something I'd see in the window of an FM synthesizer). If the data in question is not random, it is really very important. If the data is forming a complex waveform, the implications are profound. I am not revealing what the data is, right now, because anyone who knows what it is will say "impossible," and their brains will turn off. I will tell you this about the data: It consists of 74 numbers only. Each number is supposed to be the sum of an extremely large number of random, independent factors. A reasonable metaphor would be, imagine you have a million dice. Imagine you - or something - rolls them all, every second, for the entire day, and sums up the results of all them. Imagine doing that for 74 days. That's what this data is like. I think data like that should obey the central limit theorem and be normally distributed. This data is not - not even close. And when you graph it out, this is what it looks like:

I'd really like to know if people agree with me that this data is not random, and if they agree it looks like part of a waveform. Of course it doesn't have to be both. The not randomness of it is more obvious when you focus in on one period at a time. For instance, look at days 46 to 63, below. See how the first half (days 46 to 54) is a mirror image of the second half (days 54 to 63):

Would that happen randomly? Now look at days 34 to 45.

See how it makes two slanted S shapes? And finally, check out the first 34 days:

That sure doesn't look like it's going up at a random rate. It looks like a steady, smooth curve. And the whole thing comes real close to a simple sine wave:

Please let me know what you think. After a few people have responded, I'll reveal what the data is, and everybody will say "holy @#&*" Thanks :)

Drakkith · Aug 27, 2013

Or you could just reveal the data and not try to play us.
Perhaps it would help us give you an answer.

MickN · Aug 27, 2013

I'm not trying to play anybody. If you don't want to help me, don't.

Drakkith · Aug 27, 2013

MickN said:

I'm not trying to play anybody. If you don't want to help me, don't.

Holding back info isn't a very good way to get started here on PF. Whether we find out now or later isn't going to affect what we think about the data in the end. Heck, you don't even have your Y axis labeled on your graph! C'mon! We can't even tell what the scale is! How can you expect help if you don't give us all the data to make our decision?

256bits · Aug 28, 2013

Stock market trend
http://www.tradingonlinemarkets.com/Beat_the_Market/Stock_Market_Trend_Analysis.htm

If one takes 60-day lengths somewhere on the given graphs, one can always find a pattern to fit along a partial sine wave.

sophiecentaur · Aug 28, 2013

The autocorrelation function of a random wavefrom of infinite length is a delta function, or so I seem to remember. To analyse that data, you could look at its autocorrelation function and get some idea of how 'random' it is.

walk_w/o_aim · Aug 28, 2013

sophiecentaur said:

The autocorrelation function of a random wavefrom of infinite length is a delta function, or so I seem to remember. To analyse that data, you could look at its autocorrelation function and get some idea of how 'random' it is.

Assuming the waveform is wide sense stationary (WSS), the autocorrelation function is only a delta function if the underlying process is white noise, i.e. the random variables at distinct times are uncorrelated. Even if the sample autocorrelation looks nothing like a delta function, it could still be random, but with correlation between the random variables at distinct times.

With that said, I think there's also a problem with the suggested approach of taking the sample autocorrelation directly. If we assume the waveform is purely noise, it is highly unlikely that it is WSS or almost WSS (there does not appear to be a constant mean). In this case, the sample autocorrelation doesn't really tell us much.

The waveform is more likely to be the sum of a deterministic waveform and some random noise. The sine wave in the original post seems like a good enough fit for the deterministic part. Perhaps the original poster could subtract the sine wave from the waveform, and then take the sample autocorrelation - that may lead somewhere. I still don't think the random part is white, but it could be some type of coloured noise.

sophiecentaur · Aug 28, 2013

walk_w/o_aim said:

Assuming the waveform is wide sense stationary (WSS), the autocorrelation function is only a delta function if the underlying process is white noise, i.e. the random variables at distinct times are uncorrelated. Even if the sample autocorrelation looks nothing like a delta function, it could still be random, but with correlation between the random variables at distinct times.

With that said, I think there's also a problem with the suggested approach of taking the sample autocorrelation directly. If we assume the waveform is purely noise, it is highly unlikely that it is WSS or almost WSS (there does not appear to be a constant mean). In this case, the sample autocorrelation doesn't really tell us much.

The waveform is more likely to be the sum of a deterministic waveform and some random noise. The sine wave in the original post seems like a good enough fit for the deterministic part. Perhaps the original poster could subtract the sine wave from the waveform, and then take the sample autocorrelation - that may lead somewhere. I still don't think the random part is white, but it could be some type of coloured noise.

That all makes sense - cheers.
I know there are all sorts of statistical tests for recognising patterns and significance. I think, to make them useful, you need to know as much as possible about the system the data is from.

But we still don't know the purpose of the OP. Was it some kind of 'test' - with the answer to be given when we have finished answering such a vague question?

MickN · Aug 28, 2013

Thank you everybody for chiming in! I apologize for the secrecy. The reason I took that approach is I was worried that if people knew what the data represented, that would bias their opinion as to whether or not the data was random. The reason I went as far as removing the axes labels and numbers is I thought someone might google the numbers and then find out what the data was. I can't imagine why I thought anybody would be that desperate to figure out what the data was, I guess I was kidding myself. I am a little surprised at the notion that you have to know what the data represents in order to tell if it is random or not. I know you can't always tell if something is random just by looking at it, but surely sometimes you can. For instance, do you need to know what the following data is in order to tell that something non-random is happening?

How about this?

Or this?

or this?

The data in question is the total number of crimes committed per year in the US from 1938 to 2011. This data should appear random. Crime rates are supposed to be the results of a large number of independent, random factors. Supposedly, this is the reason nobody can ever predict or explain crime rates. This by itself should make the data appear random. When you take into account the fact that there are literally hundreds of millions of people in the U.S., each one deciding independently of each other whether or not to commit a crime, if there is even just a little bit of randomness in the decision-making process for each individual, the yearly total should be like rolling 300 million dice over and over, all year, and then adding it up. If there ever was data that should be normally distributed, it's total crimes per year. But it isn't. Here's a histogram I whipped up:

That's not a histogram of a process with a lot of randomness in it. If something is the result of a large number of independent factors, the average result should be typical. It should occur frequently. You can see above that with total crime, the average results are very rare, and the extreme highs and lows are far more common. This is a histogram of something that is swinging, like a pendulum, which brings me to my next point. If there is harmonic motion in the crime data, which there would have to be if it formed a complex waveform, the implications would be profound. It would mean there is a fundamental frequency. All complex waveforms have a fundamental frequency. In this case it would be a fundamental frequency of crime, or, as I prefer to call it, a fundamental frequency of evil. There would also have to be a "restorative force." All harmonic motion requires a restorative force that resists and opposes disturbances, restoring the medium to its preferred state. In this case the restorative force would be resisting and opposing evil - a force that permeates the universe (or, at least, permeates the USA) resisting and opposing evil.

Stop laughing! Even if it isn't a waveform, the lack of randomness is important. By the way, the individual crime categories, murder, theft, rape, etc., don't show the same kind of symmetries seen in the totals, except for the years around 1991/2. Check out murders per year:

This one shows, theft, rape and murder:

I've run the "runs" test on the totals and various subcategories, like, property crime rates in California, or rape rates in DC, (using this website: http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/Randomness.htm) and the results are always "strong or very strong evidence against randomness."

For those of you that know what "Benford's Law" is, crime rates do not follow Benford's law. Here are some charts:

I won't try to explain Benford's law, except to say the lack of adherence to it suggests that something systematic is going on. Which I think is obvious from the graphs themselves. Like I said, whether or not the data forms a waveform, the lack of randomness is important. Crime appears to be a deterministic, dynamical system.

Ok, so that's basically it. I do think I have an idea as to what "systematic" thing is going on with the crime rates, which I'll explain if anyone is interested. In the mean time, I'm still very interested in knowing if people agree or disagree that the data is not random, and is, is not or may be a complex waveform (or part of a complex waveform). Thanks again!

By the way. The reason I have posted this in the physics forum instead of a social sciences forum is I figured people with experience in the physical sciences would be more likely to know what complex waveforms tend to look like, and what randomness looks like. I'm not sure that makes sense, now that I think about it. Anyway I'd like to know what everybody thinks, of course.

Mick

Oh, I guess I should post the original chart with all the numbers & labels, etc. Hereya go:

jbriggs444 · Aug 28, 2013

The data in question is the total number of crimes committed per year in the US from 1938 to 2011. This data should appear random. Crime rates are supposed to be the results of a large number of independent, random factors. Supposedly, this is the reason nobody can ever predict or explain crime rates. This by itself should make the data appear random. When you take into account the fact that there are literally hundreds of millions of people in the U.S., each one deciding independently of each other whether or not to commit a crime, if there is even just a little bit of randomness in the decision-making process for each individual, the yearly total should be like rolling 300 million dice over and over, all year, and then adding it up. If there ever was data that should be normally distributed

While reported crime rates can be thought of as the result of huge numbers of independent die rolls, there is the niggling problem that the dice are loaded differently across time.

The propensity of events to be reported, the propensity of people to commit crimes, the set of activities that qualify as crimes, the size of the population. All of those change over time.

If we could go back and replay 1950 over and over and over again and plot the crime statistics for every such trial then that "should" look random. But we can't do that.

MickN · Aug 28, 2013

Thanks. I am interpreting what you've said as: it is not possible to tell if the data is random or not just by looking at the graphs I've provided. Is that fair?

jbriggs444 said:

While reported crime rates can be thought of as the result of huge numbers of independent die rolls, there is the niggling problem that the dice are loaded differently across time.

The propensity of events to be reported, the propensity of people to commit crimes, the set of activities that qualify as crimes, the size of the population. All of those change over time.

If we could go back and replay 1950 over and over and over again and plot the crime statistics for every such trial then that "should" look random. But we can't do that.

cjl · Aug 28, 2013

I don't see any reason to expect crime rates to be completely random. Why would you expect them to be?

(I think any attempt to fit them to a sinusoid is likely incorrect though - why would crime rates be periodic? Also, it's pretty difficult to establish periodicity with less than a third of a complete period worth of data)

Nugatory · Aug 28, 2013

MickN said:

Thanks. I am interpreting what you've said as: it is not possible to tell if the data is random or not just by looking at the graphs I've provided. Is that fair?

Not just by looking at graphs. You can get a "not random" answer that way, or a "maybe random" answer, but not "definitely random". However, if you google around for "statistical tests for randomness" you'll find pointers to some mathematical techniques that may be able to answer the question.

(Of course, for all the reasons already discussed in this thread, you will find that the sequence of crime-rate numbers from year to year is not random.)

MickN · Aug 28, 2013

Actually, I do not expect crime rates to be completely random. They appear to me not to be random at all. However, the "experts," the criminologists and economists who try to explain the causes of crime, say things like this:

"With all of the random factors that influence the amount of criminal conduct, it is virtually impossible to fully explain or precisely predict the crime rate at any point in time. "
That's from http://www.questia.com/library/1G1-54700685/understanding-the-time-path-of-crime

But when I look at those graphs, I don't see random changes, I see systematic changes - with long, intricate patterns, lasting years, being repeated, or reversed - throughout the whole thing. Furthermore, there's the central limit theorem, which says the more a measurement is like the sum of independent variables with equal influence on the result, the more normality it exhibits. That means a histogram of the data should appear "bell" shaped. In case you don't know, a histogram is a chart of the frequency of the results. IE, how many times were there 1-2 million crimes committed in a year, how many times were there 2-3 million crimes committed, etc. Like this:

The crime data histogram should look like that too, but, instead, it looks like the opposite:

Anyway, it may or may not be significant that the crime data does not appear random, but, can I, for the record, get you to state that, in your opinion, it does not look random?

Thanks

cjl said:

I don't see any reason to expect crime rates to be completely random. Why would you expect them to be?

(I think any attempt to fit them to a sinusoid is likely incorrect though - why would crime rates be periodic? Also, it's pretty difficult to establish periodicity with less than a third of a complete period worth of data)

mfb · Aug 28, 2013

MickN said:

Thanks. I am interpreting what you've said as: it is not possible to tell if the data is random or not just by looking at the graphs I've provided. Is that fair?

It is certainly true that the data has trends - 1938 there were less committed (or reported) crimes than today.
This alone does not tell us anything, there are literally hundreds of factors that changed with time, that could have an influence on crime rate. Here are some of them:
- the total population is an obvious one - in a larger population, you expect more crimes. It would be useful to plot crimes/population to reduce the impact of that factor
- the fraction of the population living in towns / big cities. Those tend to have higher crime rates (per inhabitant) worldwide.
- the age distribution. Babies cannot commit crimes, for example.
- police presence, fraction of crimes that gets reported, detection rate, punishments, ...
- some crime categories are completely new - there was no way to be criminal via the internet in 1938.
- ...

This by itself should make the data appear random.

If the environment stays the same. It does not. There is certainly some randomness - it is random if there are 998,173 or 997,467 crimes in a specific year. For those large numbers, the effect of randomness is negligible. If you look at very small numbers (like murders in a small town), you also get correlations between crimes - a serial killer can increase the local rate significantly, for example. A new police station can suddenly reduce the number of other crimes, and so on.There are so many factors influencing the crime rate that it is basically impossible to make accurate predictions. That is pseudo-randomness, and it can lead to all sorts of funny non-random-looking distributions.Oh by the way, your values do not span enough orders of magnitude to apply Benford's law.

Nugatory · Aug 28, 2013

MickN said:

Actually, I do not expect crime rates to be completely random. They appear to me not to be random at all. However, the "experts," the criminologists and economists who try to explain the causes of crime, say things like this:

"With all of the random factors that influence the amount of criminal conduct, it is virtually impossible to fully explain or precisely predict the crime rate at any point in time. "

You may be confusing two different things here

First, it is possible for a large number of individually random events to combine to produce results that are not even slightly random; an example would be the way that we can predict with exquisite accuracy the pressure of a given amount of gas at a given temperature and volume - even though that gas is made up of an enormous number of randomly moving particles. Thus, it's not at all inconsistent to say that there are many random factors yet there are strong trends in the outcomes.

Second, just because a sequence of numbers (such as the crime rate from year to year) shows non-random trends, it does not follow that these trends can be easily explained and predicted; it depends on the complexity of the underlying system. We can predict the behavior of gases because each individual gas molecule is just like every other one and obeys remarkably simple rules; the same is not true of individual humans in a complex environment. So again, there's no contradiction in saying that there are trends yet we cannon explain or predict them.

MickN · Aug 28, 2013

mfb said:

There are so many factors influencing the crime rate that it is basically impossible to make accurate predictions. That is pseudo-randomness, and it can lead to all sorts of funny non-random-looking distributions.

So you're saying you think the crime rates are exhibiting pseudo-randomness, is that correct?

sophiecentaur · Aug 28, 2013

Pseudorandomness means it is generated by an algorithm. That's not how any natural process works. There could be chaotic factors or the model is just too complicated to analyse.

mfb · Aug 28, 2013

Pseudorandomness not in the way this word is used in computer science, but crime rates certainly have some deterministic features that cannot be predicted in any meaningful way, so these effects look random.

MickN · Aug 28, 2013

It is interesting that you mention gas pressure, and the exquisite accuracy with which it can be predicted. Below is a graphical representation of the vapor pressure model.

The equation that so accurately predicts vapor pressure, and which created that red line is: y=exp(a+b/x+cln(x)), where a = 8.77727455314E+004, b =-2.03300596916E+007, and c = -1.02099944799E+004]

However, that is not vapor pressure data that you see plotted there. That's the crime data, sticking to that equation with a correlation coefficient over 0.99. Are you beginning to see my point? The crime data isn't slightly non-random. It isn't random at all. The randomness is miniscule. There is clearly a non-random, dynamical system in effect here.

Sure, there are lots of independent, random factors influencing criminal behavior, but their overall effect they have on the year to year changes seen in the total is nothing compared to the non-random factor that caused it to go from approximately 2 million crimes per year to almost 16 million crimes per year from 1938 to 1991, and then back down again.

By the way, I am not saying the total number of crimes is going to continue to follow the vapor pressure model. The crime data fits virtually any mathematical model that creates a curve. Here's the Gaussian model, for example:

And here's sinusoid:

Again I say, the crime rates appear to be, not random, but a dynamical system. There's no way of knowing what model actually mimics the processes that determine how many crimes are committed each year. Nevertheless, those processes are so regular and non-random, that the data has a .99 correlation coefficient with a sine wave. No random process is going to spend 70 years mimicking a sine wave.

Nugatory said:

You may be confusing two different things here

First, it is possible for a large number of individually random events to combine to produce results that are not even slightly random; an example would be the way that we can predict with exquisite accuracy the pressure of a given amount of gas at a given temperature and volume - even though that gas is made up of an enormous number of randomly moving particles. Thus, it's not at all inconsistent to say that there are many random factors yet there are strong trends in the outcomes.

Second, just because a sequence of numbers (such as the crime rate from year to year) shows non-random trends, it does not follow that these trends can be easily explained and predicted; it depends on the complexity of the underlying system. We can predict the behavior of gases because each individual gas molecule is just like every other one and obeys remarkably simple rules; the same is not true of individual humans in a complex environment. So again, there's no contradiction in saying that there are trends yet we cannon explain or predict them.

Dale · Aug 28, 2013

Hi MickN, you have some misconceptions about randomness.

MickN said:

It consists of 74 numbers only. Each number is supposed to be the sum of an extremely large number of random, independent factors. A reasonable metaphor would be, imagine you have a million dice. Imagine you - or something - rolls them all, every second, for the entire day, and sums up the results of all them. Imagine doing that for 74 days. That's what this data is like.

Here is the first misconception. If you rolled one dice each day and recorded the individual dice roll that would be random. If you roll a million dice and sum that up each day, then that will look very non-random on a plot like this. In fact, I just wrote a brief computer program to do exactly that and got 3.50E6 every day. It is only in the 4th significant digit where there is any variation.

MickN said:

The not randomness of it is more obvious when you focus in on one period at a time. For instance, look at days 46 to 63, below. See how the first half (days 46 to 54) is a mirror image of the second half (days 54 to 63):

This is not a test of randomness. If you look closely at any random sequence you should be able to find small spots that randomly appear to have some sort of pattern.

MickN said:

Would that happen randomly? Now look at days 34 to 45.
...
See how it makes two slanted S shapes? And finally, check out the first 34 days:
...
That sure doesn't look like it's going up at a random rate. It looks like a steady, smooth curve. And the whole thing comes real close to a simple sine wave:
...
Please let me know what you think.

Again, none of this looking for nice shapes in the noise has anything to do with randomness. If it is random you would expect to see this kind of thing happen anyway.

If you want to determine if a signal is random then you must apply a statistical test for randomness. You cannot simply go about eyeballing stuff and saying "wow, look at that". One of the easiest statistical tests for randomness is the runs test:
http://en.wikipedia.org/wiki/Wald–Wolfowitz_runs_test

Run that on the data and see what it says. Looking at it, I think that it is probably non-random, but not for the reasons that you mention. Also, I don't know why you would think that crime rates should be random or normally distributed.

EDIT: I see that you have already run the runs test:

MickN said:

I've run the "runs" test on the totals and various subcategories, like, property crime rates in California, or rape rates in DC, (using this website: http://home.ubalt.edu/ntsbarsh/Business-stat/otherapplets/Randomness.htm) and the results are always "strong or very strong evidence against randomness."

So clearly it is not random.

MickN said:

For those of you that know what "Benford's Law" is, crime rates do not follow Benford's law.

Benford's law doesn't apply here. For Benford's law you need data which covers a much broader range. Also, Benford's law has nothing to do with randomness, it is an indication about whether or not data has been falsified.

analogdesign · Aug 29, 2013

I think this is just one giant strawman, MickN. The fact that there are a lot of variables that influence crime is not evidence that the number of crimes should be random. You seem to imply that the Central Limit Theorem applies here but the CLT only applies when the different components are not highly cross-correlated.

So many things go into crime rates, such as social conditions, economic conditions, demographics (e.g. average age of population changes), enforcement conditions, trends in drug policy and judicial policy and so on. I would say none of these inputs are random, and they are highly, highly correlated, in often exceedingly complex ways.

You're approaching the data too much like an astrologer and not enough like an astronomer, if you catch my drift.

Is this a waveform or random data?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

How would you describe the data

It is random

It is not random, but does not look like it is part of a complex waveform

It is not random, and I don't know if it looks like it is part of a complex waveform

It is not random, and does look like it is is part of a complex waveform

Similar threads

High School Is there anything in the Universe that is not fundamentally made up of matter?

New person here, where do I post my own personal hypothesis?

High School Buoyancy and gravity

High School Individual photons or electrons on a screen?

Undergrad Questions about bubble behaviour

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect