Gaussian signal, extract uniform distribution of values

Mr Peanut · Apr 27, 2014

Hello,

From an offset zener diode breakdown circuit, I have collected a set of bytes from an ADC. The values distribute normally as integers between 0 and 1024 with a mean of 512. I would like to use the data to create a set of random integers that distribute uniformly.

So far, I have tried taking the subset of the values that lie between 100 and 999, then:

val = val/100
val = val - floor(val)
val = val * 100

This gives me a uniform distribution of values between 0 and 99 (provided I collect enough data).

Is there a better way? Perhaps one that provides more than 100 possible values.

chogg · Apr 28, 2014

You can't have a normal distribution between 0 and 1024 (not to mention a normal distribution which gives integers!). It could be approximately normal, though. What's the standard deviation?

chogg · Apr 28, 2014

Thinking more about it: do you by any chance have a binomial distribution? If so, then forget my question about standard deviation; your distribution is already determined: [itex](n, p) = (1024, 0.5)[/itex].

If your data don't look like this, I guess you could fit them to a beta-binomial, or some other suitable discrete distribution with finite support...

Anyway, I also wanted to say that your procedure won't give you uniform random numbers anyway. They probably look uniform, but you're fooling yourself. Try computing the actual probability distribution for any two numbers from 0 to 99: I'd be extremely surprised if they turn out to be the same.

You say it's uniform "provided [you] collect enough data". Ironically, it only looks uniform if you don't collect enough data! :-)

Mr Peanut · Apr 28, 2014

Noise from a reversed biased diode has a Gaussian PDF. In my experiments, the AC voltage leaving a reverse-biased zener sub-circuit is ~5 mA. An oscilloscope monitoring the output shows a promising fuzz. I amplify the signal using a 2-stage amplifier that magnifies the signal by ~500X and centers it at at 2.5V above ground (max ~5V, min ~0). This signal fed into a serial ADC with a 10 bit resolution over a an input range of 0 - 5 volts. I sample the output bytes in increments of 50 mS. Typically I collect 32768 samples. The results, if plotted sequentially, mimic the oscilloscope, displaying a fuzzy band of noise with increasing point density towards a horizontal trend line at 512.

I can determine the number of times each integer occurs in the 32768 data set. Plotting the integer values vs the number of occurrences per value shows a typical Gaussian distribution. The mean of the distribution is 512 (+/- 10 run to run). The standard deviation varies depending on the characteristics of the zener diode, but ~180 is typical.

Simultaneously plotting an appropriately scaled normal PDF computed for values between 0 and 1024 using the data set's mean and standard deviation convinces me that I have sampled a Gaussian distribution. The power spectrum shows one prominent peak at zero and a low, but fuzzy, baseline elsewhere. Computing means, minima, and maxima of successive dyadic splits of the data set indicates that no unwanted clumping of values is occurring.

I would like to exploit the noise and generate random numbers. To do this, I need to determine a way to create a uniform distribution that exploits the stochastic variation in the data set.

The one way I have found is to discard all values less than 100 or greater than 999. Then, removing the most significant digit from each value results in a data set with values between 0 and 99. The probability density function of the resulting data set is a uniform distribution. And, the more points in the data set, the less chatter there is in the distribution. But... the diversity of the values is limited to 0 – 99. Clearly, I can collect 10 data sets and get 0 – 999, etc.

Is there a better way? If not, I'll probably rework the circuit as a random bit generator and assemble the integers bitwise.

chogg · Apr 28, 2014

Mr Peanut said:

Noise from a reversed biased diode has a Gaussian PDF. In my experiments, the AC voltage leaving a reverse-biased zener sub-circuit is ~5 mA. An oscilloscope monitoring the output shows a promising fuzz. I amplify the signal using a 2-stage amplifier that magnifies the signal by ~500X and centers it at at 2.5V above ground (max ~5V, min ~0). This signal fed into a serial ADC with a 10 bit resolution over a an input range of 0 - 5 volts. I sample the output bytes in increments of 50 mS. Typically I collect 32768 samples. The results, if plotted sequentially, mimic the oscilloscope, displaying a fuzzy band of noise with increasing point density towards a horizontal trend line at 512.

That helps me understand. I was confused because Gaussians have continuous output and no boundaries, but your system is discrete and bounded.

I thought the reverse-biased zener diode had Poisson noise, not Gaussian? But presumably in the range you're measuring a Gaussian would be a good approximation.

Mr Peanut said:

I can determine the number of times each integer occurs in the 32768 data set. Plotting the integer values vs the number of occurrences per value shows a typical Gaussian distribution. The mean of the distribution is 512 (+/- 10 run to run). The standard deviation varies depending on the characteristics of the zener diode, but ~180 is typical.

+-10 strikes me as too much variation. I would have expected 180 / sqrt(32768), which is more like +-1. (See standard deviation of the mean.)

Mr Peanut said:

Simultaneously plotting an appropriately scaled normal PDF computed for values between 0 and 1024 using the data set's mean and standard deviation convinces me that I have sampled a Gaussian distribution. The power spectrum shows one prominent peak at zero and a low, but fuzzy, baseline elsewhere. Computing means, minima, and maxima of successive dyadic splits of the data set indicates that no unwanted clumping of values is occurring.

I would like to exploit the noise and generate random numbers. To do this, I need to determine a way to create a uniform distribution that exploits the stochastic variation in the data set.

The one way I have found is to discard all values less than 100 or greater than 999. Then, removing the most significant digit from each value results in a data set with values between 0 and 99. The probability density function of the resulting data set is a uniform distribution. And, the more points in the data set, the less chatter there is in the distribution. But... the diversity of the values is limited to 0 – 99. Clearly, I can collect 10 data sets and get 0 – 999, etc.

Is there a better way? If not, I'll probably rework the circuit as a random bit generator and assemble the integers bitwise.

Again: using the least significant digits may look uniform, but I'd be extremely surprised if it actually is. I would expect if your experiment is highly repeatable, and you take a large number of samples, there would be a residual pattern in the frequencies. I'd expect a relative standard deviation for each number of 1/sqrt(327), or about 5%. I'm not surprised you can't detect the non-randomness at the 5% level. And sqrt() goes down rather slowly, so you'd have to take a lot of samples to see it.

Anyway... why even have the ADC? Couldn't you just take analog readings? It seems like digitizing discards some information which you could be using.

The cumulative distribution function (CDF) of each value will have a uniform distribution from 0 to 1. For discretized data such as you have, that will only be approximately true (but it may be good enough).

On the other hand, your bitwise generator idea sounds good. I'd have a lot more confidence in a 50/50 distribution from random noise. Even if you're a bit off (no pun intended!), you still have almost as much entropy. A crappy biased generator with p=0.4 (instead of p=0.5) still has 97% of the entropy per bit.

Stephen Tashi · Apr 29, 2014

Mr Peanut said:

and centers it at at 2.5V above ground (max ~5V, min ~0). This signal fed into a serial ADC with a 10 bit resolution over a an input range of 0 - 5 volts.

How stable is the above arrangement as the circuits age, change temperature, etc.? Should the mathematical technique be robust enough to work if things get out of calibration?

Mr Peanut · Apr 29, 2014

Thanks Chogg,

Back to the circuit board for me. Bitwise seems to be the way to go.

Mr Peanut · Jul 20, 2014

Here's a final summary of the project:

http://www.codeproject.com/Articles/795845/Arduino-Hardware-Random-Sequence-Generator-with-Ja

I felt that this link should be in the thread for future interest.

FactChecker · Jul 20, 2014

As @chogg said, if you plug your original numbers into the Cumulative Distribution Function (CDF) of the normal distribution, the resulting values should be uniform on (0,1). Unfortunately, the CDF of the normal distribution is an integral, so you may need to use a table of values. There are very accurate tables available. MATLAB has a function, normcdf, that can be used.

Gaussian signal, extract uniform distribution of values

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad My basic understanding of set theory

Undergrad The problem of points

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect