## Expectation value for CDF fails...

Hi,

(Sorry for the slight misnomer in the title... I can't edit it!)

I'm doing several problems to compute the expectation value and variances of sub-samples & operations on the normal distribution; and I am having trouble getting results that agree with numerical simulations.

I have several Gaussian random generators written in C, and am using vectors of length 1million to 10billion to do numerical experiments; all generators appear to converge accurately to at least 4 decimal places with the sample sizes I am using. For example in μ(σ) notation, adding 0(1) + 0(1) gives 0(1.414), exactly as expected.

The random generator is crude, but replacement of it doesn't change the result.
Program is attached...

So, for my first (trivial) problem (and hopefully the same mistake is in my other questions!):
When I do Ʃ |0(1)|/n, I get 0.8933.

However, if I try to solve the latter problem analytically, I get a different result.
Could someone point out where I have made a mistake, please?? :)

I think the pdf for a Gaussian is:
$$p(x)= \sqrt{1 \over 2\pi}e^{-{1\over2}x^2}$$

Since abs() only uses Half the Gaussian curve, the pdf ( w/ domain [0,∞) ) becomes:
$$p(x)= 2\sqrt{1 \over 2\pi}e^{-{1\over2}x^2}$$

To compute the expectation value, all I ought to do is solve:
$$\overline{X}=\int\limits_0^{∞}x \times p(x)$$

$$\overline{X}=2\sqrt{1\over2\pi}(\left.-e^{-0.5x^2}\right|_0^{∞})=2\sqrt{1\over2\pi}≈0.7979$$

Which is, of course, wrong....
So, what did I do that messed the result up?
Attached Files
 compute.txt (3.2 KB, 6 views)

 PhysOrg.com science news on PhysOrg.com >> Hong Kong launches first electric taxis>> Morocco to harness the wind in energy hunt>> Galaxy's Ring of Fire
 Recognitions: Homework Help Science Advisor I believe your algebra is right and your program wrong. I set up a spreadsheet using RAND() and the Box-Muller method to generate sets of 600 random values with N(0,1) distribution. The averages of the absolute values hovered around .75 to .85.

 Quote by haruspex I believe your algebra is right and your program wrong. I set up a spreadsheet using RAND() and the Box-Muller method to generate sets of 600 random values with N(0,1) distribution. The averages of the absolute values hovered around .75 to .85.
Hmm...
I just wrote a small test program in Python to double check, as it uses a ratio of uniform [0,1) randoms to generate a Gaussian -- and it does indeed come up with 0.7978 for 10Million vectors.
So it does have to be my program.

Thanks.

Edit:
Ohhhh! I just found it, I was still taking the root of the answer and computing the standard deviation -- not the mean/expectation....
Sigh... Thanks again, and very much!

--Andrew.

## Expectation value for CDF fails...

Unfortunately, my other mistakes are not in fact related...

Side note: For those downloading the program out of curiosity, there are bugs in it. (The rejection method needs to be multiplied by 1/sqrt(2 pi). etc.

I have found & written much better Gaussian random number generators; and I'd rather share good code, than leave broken code around in the near future: I am attempting 1 trillion computation calculation experiments, now, and it is becoming more important to SEIRIOUSLY optimize the generator for speed AND accuracy. I'm sure some of you would like to do the same -- but easily... I'd like to help.

To my surprise: The Box muller method really isn't that fast! although it is reasonably accurate. Computing the logarithm function appears to be the slowest possible math function for doing Gaussian generation -- and (to my surprise) almost all *accurate* random number generators use it on every computation.

I have come up with a much better method (and of course, found out someone else discovered it before me...) but I am re-writing it to avoid the logarithm function as much as possible. This gives a 10x speed improvement on my machine -- so it IS worth chasing after....

On to my mistake ... This one has to be analytical:
I am attempting to compute the standard deviation of two N(0,1) distributions added together -- and the answer is sqrt(2). I'm not sure where I lost the constant, but I get sqrt(4) -- and I am thinking, maybe when I did a change of variables -- I forgot something. I just looked it up, and I am tired enough -- maybe I am just blind to what step I mis-copied.

The probability of each data element is:
$p(x)={1 \over k}(e^{-{1\over2}x^2}e^{-{1\over2}y^2})$
$k=\int \limits_{- \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty } e^{-{1 \over 2}(x^2+y^2)}dxdy=\pi$

The variance for addition is:
$\sigma^2(x,y)=(x+y)^2$

Which makes the weighted variance:
$p(x,y) \sigma^2(x,y) = {1 \over \pi}(x+y)^2e^{-{1 \over 2}x^2}e^{-{1 \over 2}y^2}={1 \over \pi }(x+y)^2e^{-{1 \over 2}(x^2+y^2)}$

So the combined equation yields:
$\overline{ \sigma ^2 }= \int \limits_{- \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty }{1 \over \pi }(x+y)^2e^{-{1 \over 2}(x^2+y^2)}dxdy$

And I think this is where I might have messed up? I needed a change of coordinate systems/variables...
$\overline{ \sigma ^2 }= \int \limits_{ 0 }^{ 2 \pi } \int \limits_{ 0 }^{ \infty }{1 \over \pi }(r^2+2r \times cos( \theta ) \times r \times sin( \theta ) )e^{-{1 \over 2}r^2}r \times dr d \theta$
$\overline{ \sigma ^2 }= {1 \over \pi }\int \limits_{ 0 }^{ \infty }\int \limits_{ 0 }^{ 2 \pi } (1+ 2cos( \theta ) sin( \theta ) )r^3e^{-{1 \over 2}r^2} \times d \theta dr$
$\overline{ \sigma ^2 }= {1 \over \pi }\int \limits_{ 0 }^{ \infty } \left. ( \theta - {cos^2( \theta )} ) \right|_{ 0 }^{ 2 \pi } r^3e^{-{1 \over 2}r^2} \times dr$
$\overline{ \sigma ^2 }= {1 \over \pi }\int \limits_{ 0 }^{ \infty } 2 \pi \times r^3e^{-{1 \over 2}r^2} \times dr$
$\overline { \sigma ^2 }= 2 \left. ( -e^{-{1 \over 2} r^2} (r^2+2)) \right|_{ 0 }^{ \infty }$
$\overline { \sigma ^2 }= 4$

Sigh, It must be simple -- but I don't see it....

 Recognitions: Homework Help Science Advisor $k=\int \limits_{- \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty } e^{-{1 \over 2}(x^2+y^2)}dxdy=2\pi$ Wrt methods for generating N(0,1) values from the uniform distribution of RAND, I've read that simply averaging batches of a dozen or so gives good enough results for most purposes, and much faster. Haven't tested that.

 Quote by haruspex $k=\int \limits_{- \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty } e^{-{1 \over 2}(x^2+y^2)}dxdy=2\pi$ Wrt methods for generating N(0,1) values from the uniform distribution of RAND, I've read that simply averaging batches of a dozen or so gives good enough results for most purposes, and much faster. Haven't tested that.

I copied that result directly from a table.... Figures, I didn't notice that it was an integral of exp( x^2 + y^2 ) and not 1/2 of that....
Thanks for releasing me from mathematical hell...

Regarding the generation of N(0,1);
I have experimented with that, and I'll point out some issues -- and future directions I plan on testing...

When I benchmark general operations inserted in measurable places within the program I submitted, and including any penalities from OS exceptions, etc. I come up with the following approximate times for 1 billion iterations.

*, or + of doubles = indistinguishable from 1 or 2 seconds.
exp() --> 46 seconds
sin(),cos() --> 15 seconds.
sqrt() --> 36 seconds.
Rand01() --> 60 seconds.
log() --> 1 minute 27 seconds.

Generation of decent random numbers isn't a really fast operation; I improved the generator I posted before to use many less operations, to fit in a CPU cache (well), remove IO and Unix dependencies... and it still wasn't fast...

To average 12 of them would be far slower than using the log() function....
I plan on exploring the Merzenine (sp?) twister core generator to see if it is much faster, but I doubt it.

I tried a three sum RAND() generator, and it produced significant errors; and was already slow.

There are a handful of Gaussian generation techniques that I find in the literature...
Trimming by rejection of either a pure Rand function, or a distorted version of it;
Addition of normalized (standard deviation scaled) RAND numbers -- because sums of Gaussians are also Gaussians...
Piecewise linear selection of random or "binned" values (Ziggaraut method), and several other hybrids...
And transformation based generators.

wrt: summing RAND()'s, I think it the addition of normalized RAND's might be fast if applied to say, a triangle distribution, by summing two RAND() values. That is something I will probably try.

The transformation class of generators is the one I believe can be made be the fastest. I haven't figured out how to code the Ziggaraut, but I would like to try it -- for that is the only serious contender that I can see by estimation of the operations used.

The main problem is that CPU Cache times are non-intuitive on the newer processors, and it isn't trivial to find out what is causing slowdowns on memory access bound routines. The zigaraut method relies on tables in memory, just as my random generator does. This is typically a blazingly fast method of computation -- but to my surprise, it really isn't in the routines I bench-marked. So, although some sites claim the Ziggaraut to be the fastest routine -- I am wondering if that is a hand optimized version, or a generic version. eg: I am not wanting to produce CPU specific code, so I would rather test code that is only written at the C level.

I found a generator very close to the idea I have for a transformation -- it is called (of all things, as I like Python...) the Monty Python method...

But the simplified description makes it clear that it too uses either the log function or the exp() function for a very large percentage of the operations. (No actual code was available in the descriptions I read.)

In my idea, I can reduce that percentage massively -- so that only 5% or less of the samples will use the exp() function -- (and perhaps twice...); so that I hope to target the add/multiply at two seconds with 6% of the exp time as a penalty (around 3-4 seconds.) I figure that's the best I am capable of coding with what I know...

We'll find out soon!

 Let the real fun begin -- I got stuck, (of course), in the tail region of the Gaussian random generator I am making. It is going to be quite fast and accurate regardless of whether I fix a small error in the tail region, but I have a question about how to correct it in a certain way... I'm attempting to make this post readable by breaking it up into chapters. I also attached a graph to this post which I will refer to constantly -- and unless you're blind, it will be extremely helpful explaining obscure sentences... (I did my best to be clear, even if you are blind.... ;) ) The question itself can be found by looking at the chapter "-------" headings. ------------------------------------------------------------- General/generic background All generators of random distributions produce random numbers in proportion to the differential area under the curve of choice. So, in principle, one can locate a point "inside" the distribution by choosing two uniform random numbers -- one for the y axis and one for the x axis, and then seeing if the y value is "under" the curve. (AKA the x,y pair is found in the area under the curve.) This method forces every point in the randomized uniform rectangle to have equal likelihood -- and since the number of points ( any y < curve's_y(x) ) making up the vertical line under the curve (at each x) is equal to the probability of choosing that x value by random chance, that's how often one should accept an x value that was uniformly chosen at random. Therefore, all x,y based generators accept an x value *iff* a random y value is "under" the curve's y(x) value. The easiest generation of the probability distribution, is to randomly select an x,y pair and throw away (reject) values until an x,y pair is found *inside* the area of the distribution curve. (AKA found Under the curve.) This final value is the only one returned. This method has one unfortunate consequence when doing Gaussian generation: The number of rejects is *very* large. Since the reject area outside a Gaussian grows rapidly as x values go beyond the 3 sigma mark (eg: the y of the curve becomes tiny) -- that means all practical Gaussian rejection generators must truncate the maximum x value they will randomly choose (I used 8 sigma as the limit in my last program...). This of course introduces error, but keeps speed of the program from being infinitely slow in picking a number... The Monty Python method of generating the Gaussian distribution uses the x,y technique -- but attempts drastically reduce the rejection area outside the Gaussian distribution (eg: any random_y(x) > Gaussian_y(x) points). The way the M.P,M does this, is to choose a fixed randomizer x,y rectangle which doesn't enclose the entire Gaussian curve's y values; Rather the rectangle excludes regions where acceptable y values exist, and hence can never be picked. BUT those excluded areas are re-mapped into areas inside the randomizer rectangle which are rejected otherwise. Hence reject points for one part of the Gaussian become acceptable points in another part -- only re-mapping of the x value is required. Even then, however, compromises are required for computation speed since not all areas can be mapped into the box efficiently by simple translations and rotations. These compromises inevitably leave some reject areas inside the randomizer rectangle. But, *if* the area of the randomizer rectangle is made exactly equal to the total area enclosed under the Gaussian -- then any reject area *inside* the x,y rectangle must exactly equal all areas not packed inside that rectangle and generally speaking, this happens to be the "tail" area previously mentioned... ---------------------------------------------------------- How I chose to partition the Gaussian for a M.P.M approach. In the graph, there is a blue rectangle which has the same total area as the Gaussian curve it maps; call this the areaBox; now notice at all small x values (xspin and x<2spin) the Gaussian curve is inside the areaBox, and also divides it vertically into an accept and reject region. The accept region is a2, and the reject is anything above a2. To remove the rejection area above a2 as much and as efficiently possible (speed is an issue), it is filled by simply rotating area a0 by 180 degrees into translated area a0r; (the rotation is around point x=spin, y=top_of_areaBox ). Because a0's x domain is from 0 to spin, the area when rotated will start at x=spin and end at x=2*spin, the latter x always happens to fall short of the maximum x value of the areaBox -- which makes packing imperfect. The point x=spin is an arbitrary constant, but it has two issues affecting what value is best; it must be less or equal to 1 and it determines the height of the areaBox. For maximum usage of the areaBox, spin needs to be as large as possible -- but there is a tail curve fitting trade off which makes "spin" values in the range range of 0.95 to 0.98 very desirable. Spin may never be greater than 1.0, because a Gaussian curve has an inflection point there -- and therefore greater values will cause a2 to overlap with a0r; Once this x=spin is chosen, the two areas a2 and a0r (found in region x=spin to x=2spin), will still have a y oriented reject area between them called e0. The requirement that spin<1 also guarantees the existence of a small un-packed area to the right of e0 which ends at the edge of the areaBox. This latter region is beneficially less than 5% of the x values in the areaBox. Hence, by virtue of area conservation -- the rejection areas of e0 and of areaBox x>2spin, must exactly add up to the tail region found immediately after x=2spin and extending to infinity. This tail is troublesome to implement -- but I discovered something useful: ------------------------------------------------------------------------------------------------------------- A very useful discovery regarding the problematic tail region and it's efficient generation... If useless area e0 is mirrored across s=2spin, (as drawn), such that x values in e0 become mirrored x values in e0m -- it reduces the tail area to the cyan colored region; And this tail region, if graphed without the e0m below it -- happens to be almost exactly another Gaussian curve.... Recursively calling my version of the Monty Python random Gaussian generator, will produce this tail perfectly -- with all values of x to infinity "potentially" returned (The recursive Gaussian x values add). I curve fit the value of "spin" and a scale factor (s) to transform the mirrored x values (of em0) so that they maximize a Gaussian fit to the tail area. The worst case error is around 0.6% (see the cyan error plot which is in percent...). Since the em0 correction only extends a distance of ~1 spin, the correction ends at around x=3*spin --- and the remainder of the equation reverts to the tail of the *original* Gaussian, therefore this discontinuity is the cause of the maximum error. However the max error occurs roughly after the, 3 sigma mark which means this error is only in about 3 of 1000 random numbers; (but as the call is recursive, the uncorrected errors compound *very* slightly...) Notice, however, the tail portion corrected by em0 (eg: with 0.2% error remaining) is executed far more often (around 3% of the generator calls) and is far more serious to minimize error in. I don't have an exact estimate of the total displacing area of all errors -- but it isn't very big. I am thinking < 0.2% * 4% = 0.8% (and perhaps quite a bit less because of exactly where it occurs...) The exact anti-derivative of the tail region for the corrected em0 portion, is listed at the end of the post. It is invalid after the em0 portion... ------------------------------------------------------------------ Question: How to transform a perfect Gaussian distribution into the desired almost Gaussian required for the tail region to be correct... I know that given an area formula for a distribution (anti-derivative) -- one can use a method called inversion to produce random samples from that distribution using a single uniform random number; It simply requires plugging a uniform random number into the anti-function of the anti-derivative. What I need is a distribution generator equal to the tail area in cyan -- it isn't an exact Gaussian -- but VERY close to one. Since I know the anti-derivatives of the tail region, analytically, And I know the anti-derivative of the ideal Gaussian -- I think there must be a way to convert the Gaussian into my tail distribution. How do I do this? A generic discussion of converting any distribution into any other via calculus integrals would be very helpful (it doesn't need to be this problem here, just a *clear example* of how this might be done.) Is anyone able to help? ------------------------------------------------------------------------------------- Appendix: Information to reproduce my graphs and equations... The graph attached was produced with the following equations using gnuplot... They may (of course) be used in other graphing programs that understand the C like ternary operator (condition)?(evaluate when true):(evaluate when false) Gnuplot also does not return errors on divide by zero, but does not plot those points. I find that useful to limit where an equation is displayed.... n(x)=exp(-0.5*x**2) serf(x)=erf(2**-0.5*x) t(x)=area/n(spin) areaBox(x)=(x < t(spin) ) ? n(spin):0 error(x)=(x>=0) && (x<=spin) ? 2*n(spin) - n(x) -n(2*spin-x) : 0 rot180(x)=(x>spin) && (x<=2*spin) ? 2*n(spin)-n(2*spin-x) : error(x-2*spin) residual(x,s)=(x>=0)?( n(x+2*spin) - error(x/s)*s ):1/0 tail(x)=residual(x,s)/residual(0,s) Note: semantically, residual computes the corrected tail region -- tail(x) is a slight misnomer, it actually scales the tail region to 1.0 to ease comparing it to a normal curve. The following constants were the ones I thought made the final tail fit to a Gaussian nearly optimal... s=1.02803051369197 scales the mirrored e0m's x values in the tail -- this adds a degree of curve fitting freedom. The scaling preserves area... spin=0.966804869573191 This was the best value of spin to make the tail region as Gaussian as possible... and keep e0 relatively small (tradeoff issues). stdS=0.609743713386175 The ideal Gaussian to be matched against in the tail region has a standard deviation in x which isn't unity -- this estimates the actual. The area of the tail function in the corrected em0 region only, ought to be (barring mistakes...): $$\int n(x')-e_{0m}(x') = \sqrt{\pi \over 2}( serf(x+2spin) + serf(x) - serf(2spin-x) )-2x \times n(spin)$$ After this region, it simply reverts to the tail of the original Gaussian, and is just plain old erf() scaled and translated appropriately... Attached Thumbnails

 Quote by haruspex $k=\int \limits_{- \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty } e^{-{1 \over 2}(x^2+y^2)}dxdy=2\pi$ Wrt methods for generating N(0,1) values from the uniform distribution of RAND, I've read that simply averaging batches of a dozen or so gives good enough results for most purposes, and much faster. Haven't tested that.

Well, this is a good example that we cannot trust everything we read in the Internet

This method is not faster but actually horribly slow, the RAND generation is actually one of the bottle necks in this kind of situations; generating a good random number is expensive so the ratio of RAND numbers per distribution number should ideally be 1/1, in some difficult situations you might settle for 2/1, but a ratio of 'dozen or so'/1 is by no means reasonable. The moment you read something like that there is no need to keep reading (or keep reading with a raised eyebrow). And not just that, in this approach speed is not the only problem, accuracy is terrible as well since it entirely relies on the Central Limit Theorem... so yeah.

 Quote by andrewr Question: How to transform one distribution into a very slightly disturbed distribution in order to correct the error... I know that given an area formula of a distribution (anti-derivative) -- one can use a method called inversion to produce random samples from that distribution using a single uniform random number; It simply requires plugging the uniform random into the anti-function of the anti-derivative. What I need is a distribution equal to the tail area in cyan -- it isn't an exact Gaussian -- but VERY close to one. Since I know the anti-derivatives of the tail region, analytically, And I know the anti-derivative of the ideal Gaussian -- I think there must be a way to convert the Gaussian into my tail distribution. How do I do this? A generic discussion with a simple worked example would be very helpful (it doesn't need to be this problem here, just a clear example of how this might be done.) Is anyone able to help?
Hi Andrew,

The two fastest approaches that I know of to generate random normal distributions are the inverse method and the Box-Muller. The Ziggurat method, though mathematically sound, when implemented in parallel computing you get an increasing divergence proportional to the number of threads; a good example for the wisdom "theory is better than practice in theory, but practice is better than theory in practice"

So since you are trying to work out a Ziggurat like algorithm maybe you want to check this paper first (Section 5.4): http://www.mendeley.com/research/sta...ous-computing/

The inversion method relies in a very efficient coding with machine accurate results of the inverse normal probability function. This is the method used by default in R (which IMHO says something about it)

Box-Muller is also good since you have a ratio RAND/Normal of 1/1 and there are specific machine develop methods (GPU, math co-processor, etc... ) to calculate sin, cos and exponentials of imaginary numbers very efficiently.

 Quote by viraltux Hi Andrew, The two fastest approaches that I know of to generate random normal distributions are the inverse method and the Box-Muller. The Ziggurat method, though mathematically sound, when implemented in parallel computing you get an increasing divergence proportional to the number of threads; a good example for the wisdom "theory is better than practice in theory, but practice is better than theory in practice" So since you are trying to work out a Ziggurat like algorithm maybe you want to check this paper first (Section 5.4): http://www.mendeley.com/research/sta...ous-computing/
I read through the paper; The application I have is just basic experimentation -- and I am actually more interested in the analytical solutions which a mathematician would come up with than the software; My need for moderate software speed is just to do verification of what I come up with analytically... but I'd like to run tests that have around a trillion samples; that means with the generators I have available -- that takes close to a week of single CPU time.

I'd love to use Xilinx Vertix 5 chips -- but that is out of budget for a hobby project; and the older 4K series of xilinx chips are way too slow (I still have some...).
Also, as this is a one shot study -- buying GPU's is also a bit of a waste (I am here to keep in practice, more than to solve a problem with \$ attached.)

 The inversion method relies in a very efficient coding with machine accurate results of the inverse normal probability function. This is the method used by default in R (which IMHO says something about it)
Yes, I am sure the inverse method is very accurate -- however, the inverse of erf() has no closed form -- so I am not sure how the polynomial curve fits would have been developed for the inverse function used in R that are machine accurate AND fast.
I would be interested in looking at such polynomials, and to experiment with them -- but I don't know how I would create them from scratch in any reasonable time frame.

 Box-Muller is also good since you have a ratio RAND/Normal of 1/1 and there are specific machine develop methods (GPU, math co-processor, etc... ) to calculate sin, cos and exponentials of imaginary numbers very efficiently.
Yes, but on Intel CPU's (which is a big portion of the audience here...) -- the Log function is often the slowest math operator possible on the math - co-processor enabled math library (gcc/visual studio C libraries.). Doing just a basic benchmark as I did in an earlier post is a sample of what I have come across. Intel does not appear to have a very efficient hardware implementation of ln() on most of their processors.

As for box-muller, The reference implementation I gave in the program at the start of the thread actually requires TWO RAND's for 1 result. Although some people tout the orthogonal nature of sin and cos to produce two results -- in practical applications, their relationship seriously skews the result of calculations... Also, it requires one log operation each run -- and on Intel that is very slow.

The article made some excellent points regarding random number generators needing to use the word size of the CPU/computer; I used byte for simplicity, and that has memory access (copying time) penalties which slow the random generator down significantly.

Since I am not using parallel computing -- I am not sure what the issue concerning divergence really is; but I would be interested in your comments -- is this a synchronization issue between thread outputs of random generation?

For my purposes, I want to learn about the calculus involved in warping one random distribution into another -- and figure out ways to test a Generator's overall accuracy.
I figure that if I keep a running list of 4 Gaussians made by my generator -- that linear combinations of these results will have *significantly* improved properties;
since linear operations are so fast as to have almost no penalty -- that is a practical solution to my generator's problem... but I'd like to go farther than the knee-jerk pragmatic solution just for knowledge's sake.

Thanks for the link,
--Andrew.

 Quote by andrewr Yes, I am sure the inverse method is very accurate -- however, the inverse of erf() has no closed form -- so I am not sure how the polynomial curve fits would have been developed for the inverse function used in R that are machine accurate AND fast. I would be interested in looking at such polynomials, and to experiment with them -- but I don't know how I would create them from scratch in any reasonable time frame.
I doubt very much they are using simple polynomials to approximate the inverse but, luckily, R is a FOSS project which means you can hack the code all you want: http://cran.r-project.org/sources.html

 Quote by andrewr As for box-muller, The reference implementation I gave in the program at the start of the thread actually requires TWO RAND's for 1 result. Although some people tout the orthogonal nature of sin and cos to produce two results -- in practical applications, their relationship seriously skews the result of calculations... Also, it requires one log operation each run -- and on Intel that is very slow.
If you get skewed data you're doing something wrong, with Box-Muller two RAND give you two independent normally distributed values.

 Quote by andrewr Since I am not using parallel computing -- I am not sure what the issue concerning divergence really is; but I would be interested in your comments -- is this a synchronization issue between thread outputs of random generation?
I haven't seen the details of the Ziggurat algorithm parallel implementation myself, but I don't think it is a sync issue but rather something related to the inner complexity of the Ziggurat algorithm itself. Maybe you can find more info in the paper's references.

 Quote by andrewr For my purposes, I want to learn about the calculus involved in warping one random distribution into another -- and figure out ways to test a Generator's overall accuracy. I figure that if I keep a running list of 4 Gaussians made by my generator -- that linear combinations of these results will have *significantly* improved properties; since linear operations are so fast as to have almost no penalty -- that is a practical solution to my generator's problem... but I'd like to go farther than the knee-jerk pragmatic solution just for knowledge's sake. Thanks for the link, --Andrew.
Well, if you care more about the theory than the practice that puts Ziggurat like algorithms back on the table and the conversation shift entirely to the math side.

I don't think though the linear combination is a good idea. You're right that the linear combination has virtually no penalty but you have to think that you will have to generate 4 numbers to get only one with improved properties and that defeats the whole purpose; by doing so you are making your algorithm around 4 time slower that it should be.

Oh wow, I was going to recommend you to check the work of Mr. Marsaglia http://en.wikipedia.org/wiki/George_Marsaglia and I just found out he passed away last year!! Well... Long life the King. That guy actually was the man when it came to randomness and computing.

Anyway, there are lots of bibliography on this subject, so you will not get short of it.

You know, whatever the method you see around they always have two steps; firstly to generate random numbers from 0 to 1 and secondly to use those numbers to generate a particular distribution. Back when I was working with random numbers/generators I thought about the possibility to create an algorithm that would do both at the same time, that is, an algorithm that directly generates a Normal distribution.... Well, easier said than done but I still think that this approach might prove fruitful.

Anyway, have fun

 Quote by viraltux I doubt very much they are using simple polynomials to approximate the inverse but, luckily, R is a FOSS project which means you can hack the code all you want: http://cran.r-project.org/sources.html
Excellent, I'll look there.

 If you get skewed data you're doing something wrong, with Box-Muller two RAND give you two independent normally distributed values.
Yes and no. There is a relationship between them that they both always have the same "radii"; so that if one computes the square of the two "independent" variables and adds them, the correlation with the NON-Gaussian radius becomes part of the result. In most applications this is irrelevant -- but not always. In a true independent generation, the square of the results will not correlate with a non-Gaussian variable in any way. In typical applications, the correlation is only important if the two values are returned sequentially -- so if one simply delays the return randomly, the problem can be removed... but the log calculation is still very slow on an Intel chip.

 Well, if you care more about the theory than the practice that puts Ziggurat like algorithms back on the table and the conversation shift entirely to the math side.
Yes... I don't wish to get stuck forever on the random issue, but I do want to learn about the calculus of doing the transformations; That part goes beyond random number generation and into error propagation when Gaussian variables and measurement of data are operated on mathematically. Surprisingly, (for example) the multiply operation on two Gaussian variables does not produce a true Gaussian -- and the typical error propagation formula found in standard texts fails under certain conditions...

I could develop an emprical way of dealing with the issue -- but I'd prefer to be *able* to analyze it more robustly. Therefore I do need to understand how operations/polynomials/etc. transform a random distribution....
That's my ultimate goal.

 I don't think though the linear combination is a good idea. You're right that the linear combination has virtually no penalty but you have to think that you will have to generate 4 numbers to get only one with improved properties and that defeats the whole purpose; by doing so you are making your algorithm around 4 time slower that it should be.
I can see why you think that -- but there is an alternative; One can simply have an array of four numbers, and then subtract the first number off the last result and add a new final number randomly generated. The new result only depends on the history -- plus one new random number. Also -- Randomly flipping the sign of the result, or of two of the numbers in the sum can prevent any averaging effect to show up between adjacent results that are returned.

Sign flipping only consumes one random bit .. and extra bits can be saved between generator calls. etc. etc.

Just as a note; Some literature I have seen talks about using an array of "seed" Gaussian values and then doing vector operations on them to produce a derived set.
(SSE or Altivec type programs).

This second set is then, only partially sampled (some values thrown away) -- and the process repeated. It's used in some super computing platforms because vector operations are cheap -- and everything else *very* expensive....

 Oh wow, I was going to recommend you to check the work of Mr. Marsaglia http://en.wikipedia.org/wiki/George_Marsaglia and I just found out he passed away last year!! Well... Long life the King. That guy actually was the man when it came to randomness and computing. Anyway, there are lots of bibliography on this subject, so you will not get short of it.
Thanks, I'm sure that will help. I have heard his name.... but never saw the article !

 You know, whatever the method you see around they always have two steps; firstly to generate random numbers from 0 to 1 and secondly to use those numbers to generate a particular distribution. Back when I was working with random numbers/generators I thought about the possibility to create an algorithm that would do both at the same time, that is, an algorithm that directly generates a Normal distribution.... Well, easier said than done but I still think that this approach might prove fruitful. Anyway, have fun
Yes, that's about what I have noticed too... but without knowing how to do the calculus on transformations, I'm not going to be able to succeed there...

as for fun: I am... !!

 Quote by andrewr Yes and no. There is a relationship between them that they both always have the same "radii"; so that if one computes the square of the two "independent" variables and adds them, the correlation with the NON-Gaussian radius becomes part of the result. In most applications this is irrelevant -- but not always. In a true independent generation, the square of the results will not correlate with a non-Gaussian variable in any way. In typical applications, the correlation is only important if the two values are returned sequentially -- so if one simply delays the return randomly, the problem can be removed... but the log calculation is still very slow on an Intel chip.
I disagree on this one. If I understand you correctly you're saying you just found some kind of correlation in the Box-Muller procedure that breaks somehow the independence of the two variables, right? You prove that and you become the new King... Long Live the King

 Quote by andrewr Surprisingly, (for example) the multiply operation on two Gaussian variables does not produce a true Gaussian -- and the typical error propagation formula found in standard texts fails under certain conditions...
Why would this be surprising?

 Quote by andrewr I can see why you think that -- but there is an alternative; One can simply have an array of four numbers, and then subtract the first number off the last result and add a new final number randomly generated. The new result only depends on the history -- plus one new random number. Also -- Randomly flipping the sign of the result, or of two of the numbers in the sum can prevent any averaging effect to show up between adjacent results that are returned. Sign flipping only consumes one random bit .. and extra bits can be saved between generator calls. etc. etc. Just as a note; Some literature I have seen talks about using an array of "seed" Gaussian values and then doing vector operations on them to produce a derived set. (SSE or Altivec type programs).
This linear relationships to get the next number are common in algorithms to generate RAND numbers, but we are talking about generating a Gaussian once you have the RAND and going recursive linear again would be an overkill, even assuming it works, which I can't see clearly it would.

Besides, sign flipping is not the issue, the issue would be to know when you have to flip the sign; for that you need another good random bit generator which, in turn, would just add more overhead to the process.

 Quote by andrewr as for fun: I am... !!
Then welcome to PF! I think we need some of that here

 Recognitions: Homework Help Science Advisor Fwiw, I've tried another approach. The idea is to map the range (0,∞) down to (0,1): Given target PDF f(x), map x to z as x = g(z), g(0)=0, g(1) = +∞, g monotonic on (0,1). h(z) = f(g(z)).g'(z) Thus h(z).dz = f(x).dx. The procedure is then: - Pick z, y from uniform distributions on (0,1), (0,K) respectively. - If h(z) > y return g(z), else pick afresh. We need the following to be true: - h and g are speedily computed - h(z) <= K for all z in (0,1) (so set K = this max) - Integral of h over (0,1) (= I =∫f(x).dx for x > 0) is a largish fraction of K E.g. g(z) = 1/(1-z) - 1 h(z) peaks at z = 0.5 with h(0.5) = 4.e-0.5, approx 2.43. Integral of f(x) for x > 0 = sqrt(π/2). This makes the 'occupancy' I/K = 0.52. Not great, but perhaps good enough. I tried it and it certainly does produces half a bell curve. The remaining problem is to flip half of the returned values to negative x.
 haruspex, That looks very interesting -- I'll see if I can code it (If I understand it correctly), and see how it does. Once I have it running -- I'd like to examine the math you did to get the result, too; and compare it against the monty python version I came up with. As an aside, I did another one of those analytical problems that perform operations on Gaussians -- in this case I did N(0,1) + N(0,1) but with the proviso that *only* values with the same sign were added (100% Cyhelsky skew correlation...); The result was: $\overline { \sigma }= \sqrt{ 2 + { 4 \over \pi } } \approx 1.809209646$ And that is correct, by numerical simulation also. The pearson covariance for the same constant is just: ${ 2 \over \pi } \approx 0.636619772$ I am curious, this is a sort of auto-correlation that I haven't found referenced anywhere online -- but I have found it very useful (as an approximate decimal constant up until now -- since I couldn't solve it analytically before!) and there is only one author who seems to be doing work in the same area -- A Czech by the name of Cyhelsky; but his work isn't online except the reference at Wikipedia. What would this constant be called, eg: I'm thinking it's the auto-correlation constant of Gaussian noise? but is there a systematic name for constants like this one? No use inventing a name if one already exists....

A picture is worth....

As far as the Monty Python method, it is only slightly faster than the pipe-lined box muller method.... Raw uncorrected Test on 100million samples:

MPM = 37 secs
BM = 47 secs

The raw version, without tail correction -- has a small but visible defect in the tail.
Picture #1 (Which messes up calculations...)

When I do a 4 sum correction, as mentioned before -- the speed is slightly slower than the pipelined BM; but the tail errors do go away.
Picture #2

However, even in picutre #2 -- the effects of adjacent samples are still a problem. I would need to shuffle results so as not to output them sequentially.
eg: Neither BM or MPM are optimal solutions..

This time, I hit a snag of a different kind:
What I found is that the conditional statements ("if") and chains of computations are bad combinations resulting in CPU pipeline disruptions... *sigh*

Hence, The only real speedup possible is by improving the random generator; that can definitely make MPM faster than BM, but it isn't worth the time investment as the speedup isn't even 100%.

But I am beginning to think this may be a dead end ; The processes being discussed are highly CPU hardware sensitive...

I included your random generator in the computation code/text file, and it benchmarked at 149 seconds...

(Amazing, huh? I would have expected those simple equations to have done better,
but rejection always is very expensive time wise... ;) )

Thank you very much for the example, though, it does indeed produce a bell curve -- (exact) -- and the calculus you used to convert one mapping to another is more useful to me than the code in any event. It helps me understand what can/can't be done to transform equations.
Attached Thumbnails

Attached Files
 compute.txt (11.9 KB, 1 views)

 Quote by viraltux I doubt very much they are using simple polynomials to approximate the inverse but, luckily, R is a FOSS project which means you can hack the code all you want: http://cran.r-project.org/sources.html
For the inversion method, they have source code called qnorm.c; and FYI in it they have several 5th degree polynomials (and higher!). The code would be exceedingly slow -- but yes, it would be accurate. They are using ratios of polynomials to produce the erf function; and not just erf of the normal(0,1), but of any std deviation and mu.

In R -- the actual random number generation code for rand_norm(); (snorm.c?),
has several methods besides inversion -- Kinderman's method from 1976, and Forsythe's method are included. I already have a later version of Kinderman's code (1977) with Monahan, instead, in my code. It is *much* shorter and just as accurate; but alas -- it's still slow.

 ls of the Ziggurat algorithm parallel implementation myself, but I don't think it is a sync issue but rather something related to the inner complexity of the Ziggurat algorithm itself. Maybe you can find more info in the paper's references.
Yes, I see some references to an issue -- but I don't quite understand it; perhaps when I have more time -- I'll get into that in more detail. The ziggaraut clearly has issues and is fairly complicated.

I'm wondering if there isn't a way to use sin() and cos() or perhaps a polynomial and sqrt() since these are far faster than Log, and might allow the BM method to be sped up significantly without much work.
Hmm....

Thanks again for the links, they were informative.