# Gaussian signal, extract uniform distribution of values

1. Apr 27, 2014

### Mr Peanut

Hello,

From an offset zener diode breakdown circuit, I have collected a set of bytes from an ADC. The values distribute normally as integers between 0 and 1024 with a mean of 512. I would like to use the data to create a set of random integers that distribute uniformly.

So far, I have tried taking the subset of the values that lie between 100 and 999, then:

val = val/100
val = val - floor(val)
val = val * 100

This gives me a uniform distribution of values between 0 and 99 (provided I collect enough data).

Is there a better way? Perhaps one that provides more than 100 possible values.

2. Apr 28, 2014

### chogg

You can't have a normal distribution between 0 and 1024 (not to mention a normal distribution which gives integers!). It could be approximately normal, though. What's the standard deviation?

3. Apr 28, 2014

### chogg

Thinking more about it: do you by any chance have a binomial distribution? If so, then forget my question about standard deviation; your distribution is already determined: $(n, p) = (1024, 0.5)$.

If your data don't look like this, I guess you could fit them to a beta-binomial, or some other suitable discrete distribution with finite support...

Anyway, I also wanted to say that your procedure won't give you uniform random numbers anyway. They probably look uniform, but you're fooling yourself. Try computing the actual probability distribution for any two numbers from 0 to 99: I'd be extremely surprised if they turn out to be the same.

You say it's uniform "provided [you] collect enough data". Ironically, it only looks uniform if you don't collect enough data! :-)

4. Apr 28, 2014

### Mr Peanut

Noise from a reversed biased diode has a Gaussian PDF. In my experiments, the AC voltage leaving a reverse-biased zener sub-circuit is ~5 mA. An oscilloscope monitoring the output shows a promising fuzz. I amplify the signal using a 2-stage amplifier that magnifies the signal by ~500X and centers it at at 2.5V above ground (max ~5V, min ~0). This signal fed into a serial ADC with a 10 bit resolution over a an input range of 0 - 5 volts. I sample the output bytes in increments of 50 mS. Typically I collect 32768 samples. The results, if plotted sequentially, mimic the oscilloscope, displaying a fuzzy band of noise with increasing point density towards a horizontal trend line at 512.

I can determine the number of times each integer occurs in the 32768 data set. Plotting the integer values vs the number of occurrences per value shows a typical Gaussian distribution. The mean of the distribution is 512 (+/- 10 run to run). The standard deviation varies depending on the characteristics of the zener diode, but ~180 is typical.

Simultaneously plotting an appropriately scaled normal PDF computed for values between 0 and 1024 using the data set's mean and standard deviation convinces me that I have sampled a Gaussian distribution. The power spectrum shows one prominent peak at zero and a low, but fuzzy, baseline elsewhere. Computing means, minima, and maxima of successive dyadic splits of the data set indicates that no unwanted clumping of values is occurring.

I would like to exploit the noise and generate random numbers. To do this, I need to determine a way to create a uniform distribution that exploits the stochastic variation in the data set.

The one way I have found is to discard all values less than 100 or greater than 999. Then, removing the most significant digit from each value results in a data set with values between 0 and 99. The probability density function of the resulting data set is a uniform distribution. And, the more points in the data set, the less chatter there is in the distribution. But... the diversity of the values is limited to 0 – 99. Clearly, I can collect 10 data sets and get 0 – 999, etc.

Is there a better way? If not, I'll probably rework the circuit as a random bit generator and assemble the integers bitwise.

5. Apr 28, 2014

### chogg

That helps me understand. I was confused because Gaussians have continuous output and no boundaries, but your system is discrete and bounded.

I thought the reverse-biased zener diode had Poisson noise, not Gaussian? But presumably in the range you're measuring a Gaussian would be a good approximation.

+-10 strikes me as too much variation. I would have expected 180 / sqrt(32768), which is more like +-1. (See standard deviation of the mean.)

Again: using the least significant digits may look uniform, but I'd be extremely surprised if it actually is. I would expect if your experiment is highly repeatable, and you take a large number of samples, there would be a residual pattern in the frequencies. I'd expect a relative standard deviation for each number of 1/sqrt(327), or about 5%. I'm not surprised you can't detect the non-randomness at the 5% level. And sqrt() goes down rather slowly, so you'd have to take a lot of samples to see it.

Anyway... why even have the ADC? Couldn't you just take analog readings? It seems like digitizing discards some information which you could be using.

The cumulative distribution function (CDF) of each value will have a uniform distribution from 0 to 1. For discretized data such as you have, that will only be approximately true (but it may be good enough).

On the other hand, your bitwise generator idea sounds good. I'd have a lot more confidence in a 50/50 distribution from random noise. Even if you're a bit off (no pun intended!), you still have almost as much entropy. A crappy biased generator with p=0.4 (instead of p=0.5) still has 97% of the entropy per bit.

6. Apr 29, 2014

### Stephen Tashi

How stable is the above arrangement as the circuits age, change temperature, etc.? Should the mathematical technique be robust enough to work if things get out of calibration?

7. Apr 29, 2014

### Mr Peanut

Thanks Chogg,

Back to the circuit board for me. Bitwise seems to be the way to go.

8. Jul 20, 2014

### Mr Peanut

9. Jul 20, 2014

### FactChecker

As @chogg said, if you plug your original numbers into the Cumulative Distribution Function (CDF) of the normal distribution, the resulting values should be uniform on (0,1). Unfortunately, the CDF of the normal distribution is an integral, so you may need to use a table of values. There are very accurate tables available. MATLAB has a function, normcdf, that can be used.

Last edited: Jul 20, 2014