## Expectation value for CDF fails...

 Quote by andrewr For the inversion method, they have source code called qnorm.c; and FYI in it they have several 5th degree polynomials (and higher!). The code would be exceedingly slow -- but yes, it would be accurate. They are using ratios of polynomials to produce the erf function; and not just erf of the normal(0,1), but of any std deviation and mu.
Well I said I doubted very much they use just polynomials cause the tails cannot be nicely approximated with them, so it seems R uses for the tails a ratio of polynomials which it is not a polynomial itself.

But I disagree that using polynomials makes the code "exceedingly slow" in fact polynomials are computationally very fast structures to calculate since you have a cost O(n-1) for nth degree polynomials. That means that most of the time the cost to calculate such function in R will be around O(4), which is great.

 Quote by andrewr I'm wondering if there isn't a way to use sin() and cos() or perhaps a polynomial and sqrt() since these are far faster than Log, and might allow the BM method to be sped up significantly without much work. Hmm.... Thanks again for the links, they were informative.
There are many performance techniques we can use to speed up algorithms, you can really do wonders if you're determine.

A family of these tricks are based on doing things like alter the algorithm in ways that mathematically makes the algorithm slower but taking into account the machine architecture actually it becomes faster. So for these techniques you have to know your machine.

Another family of techniques are based of breaking the code into pieces and then reuse calculations to avoid redundancy. For instance, if you need to calculate $sin(x)$ and $cos(x)$ (which yo do in BM) you do not want to actually use the functions $sin(x)$ and $cos(x)$, what you do is to break the code for $sin(x)$ and $cos(x)$ and merge it into a $sincos(x)$ function which will be twice as fast and, thus, will make your code much faster.

So it goes for the $log(x)$ function, for BM you have to remember that the $log(x)$ is only applied for numbers within (0,1] that means that using some $log(x)$ implementations is a waste of resources since higher orders of the polynomial adds virtually nothing to the accuracy. So let's say that $log(x)$ uses a polynomial of degree 10, since you know you will never calculate numbers higher than 1, you can create a function $log01(x)$ which only uses a polynomial of degree 5 as machine accurate as the $log(x)$ function but twice as fast.

So, I just explained a couple of tricks that can really speed up your code if you use them instead applying $sin(x)$ , $cos(x)$ and $log(x)$ right away.

And actually the fun goes ooooon... But you're fun, right?

 Quote by viraltux Well I said I doubted very much they use just polynomials cause the tails cannot be nicely approximated with them, so it seems R uses for the tails a ratio of polynomials which it is not a polynomial itself.
Yes, you are undoubtedly correct. Exactly. I just didn't understand what you meant...

 But I disagree that using polynomials makes the code "exceedingly slow" in fact polynomials are computationally very fast structures to calculate since you have a cost O(n-1) for nth degree polynomials. That means that most of the time the cost to calculate such function in R will be around O(4), which is great.
Not really .... In theory, yes -- but in practice, no.
"R" is using code from the 1960's-80's which was for CISC machines.
Cache latencies and piplining issues, were not yet part of the programming milieu;
Foss for example, has several *if* statements in the code deciding when to use which polynomial ... that's exactly what slowed my code for the MPM down horribly compared to my estimate of it's speed. (I'll explain later.)

On most modern machines, the brute force operations of addition and multiplication both have the same time delay, so it's more like 7 operations to do a 4th degree polynomial.

7 operations is about the same time as it takes to compute sin or cos to the full double precision on my machine.... But a 4th degree poly is nowhere near able to achieve the same precision.

If you look at the source code I already posted -- you'll notice that I use second and third degree polynomials to avoid computing exp(-0.5*x**2) as much as possible.
I use these polynomials > 98% of the time, compute exp() for the remainder.
I even did alignments for power of two operations where possible, since that doesn't even require a multiply instruction; but only an integer add...

For a particular number of repetitions -- I think it was around a billion -- I have time measurements for each of the math--co operations on my machine.

multiply and divide : 2 seconds (or less)
sin() and cos(): 15 seconds
sqrt(): 36 seconds.
exp(): 46 seconds.
log(): 87 seconds.

In theory, then, I expected my program to 98% of the time to do polynomials at a weight of *less* than 8 seconds. And 2% of the time, to run exp at 46+2+2 ~ 50 seconds;
That works out to <9 seconds average...
Box muller (ignoring multiplies) would be 87+36+15=138 seconds.
(Which isn't far off from reality...)

Hence, the MPM method ought to be on the Order of 15 times fasterthan BM.
But it's nowhere *NEAR* that much faster -- it is, in fact, only *slightly* faster.

If anyone sees a coding mistake, let me know -- but I don't think there is anything wrong with the way I wrote the program.

The main issue for my machine, and these issues are *VERY* common in all desktop machines whether powerPC, or Intel, and even some ARM processors...
Is that failed branch prediction must break a very deep pipeline in the processors.
On an Intel Celeron, the pipeline depth is around 40. That means, every time an "if" statement changes the flow of the machine -- the CPU runs around 40 times slower than normal. A single *if* statement can be very expensive, time wise. A 2.5GHz processor is reduced to 62MHz for a brief moment... There is no way to program preferred branch directions, either, in the C language. It's all up the CPU's automatic discretion.... (if it has any bug free ability at all...).

 There are many performance techniques we can use to speed up algorithms, you can really do wonders if you're determine.
The main thing I need to do is calculate a single exact formula to produce the Gaussian -- with no *if* statements included in the algorithm. These techniques you mention are all very good (I'm an EE, and although I do hardware -- I'm *very* familiar with these issues.)

 So, I just explained a couple of tricks that can really speed up your code if you use them instead applying $sin(x)$ , $cos(x)$ and $log(x)$ right away. And actually the fun goes ooooon... But you're fun, right?
Yes, I'm into fun. Lots of it.
I really do want to stay away from machine architecture dependent coding.
That means I can't achieve the "highest" performance -- but it also means I can share the code with anyone; I know from hardware design experience that the relative timing of my machine's math co. hold for many other processors as well.
Sin and Cos don't appear to be implementable by any method I know which will be faster using polynomials, than just using the built-in function. Log(), however, could easily be replaced -- and I have some experimental (non-polynomial) formulas that might be well worth curve fitting....

I'm learning a bunch about how a uniform random plugged into a formula produces a pdf ! eg: f( u1 ) --> a pdf of --> d/dx f^-1( x ).
This appears to be key in designing a more easily computed Gaussian generator.

There is much that I am studying in BoxMuller, some of which is just for knowledge -- other parts explorations for speed-up.

One fairly interesting path is to consider sin and cos: Since a function expressed as sin() is at 90 degrees to one expressed with cos(); I was thinking it may be possible to use this relationship to compute an anti-function of something expressed as sin(f(x)).

exp(-0.5*x**2) is slow to compute; but when I graph acos( exp( -0.5*x**2 ) ), The resulting graph appears to be a hyperbola of some kind that would be *VERY* easily computed using only multiplication, division, and addition. Hence, it would be around 2.5 times faster than computing the exponential itself, to compute the cosine of a different function... (And it may be *MUCH* faster than computing the anti-exponential directly as BoxMuller does...!)

Does anyone happen to know what the formula of the hyperbola is for acos( exp(...) ) ?

Recognitions:
Homework Help
 Quote by andrewr exp(-0.5*x**2) is slow to compute; but when I graph acos( exp( -0.5*x**2 ) ), The resulting graph appears to be a hyperbola of some kind
Doesn't look like that to me. There's a right-angle at the origin (gradient -1 to the left, +1 to the right), and an asymptote at y = π/2.
I guess if you look at x > 0 only, it does resemble a hyperbola. Taking the gradient near the origin to represent the other asymptote, the angle between asymptotes on the same branch is 3π/4. So the hyperbola should look like y2 = tan2(3π/8)x2 - c, rotated clockwise by 3π/8 and suitably translated.

 Quote by andrewr R" is using code from the 1960's-80's which was for CISC machines. Cache latencies and piplining issues, were not yet part of the programming milieu;...
OK, I admit you are are fun and you caught my curiosity, so I downloaded the code and checked qnorm.c and I found this:

Code:
/*-- use AS 241 --- */
/* double ppnd16_(double *p, long *ifault)*/
/*      ALGORITHM AS241  APPL. STATIST. (1988) VOL. 37, NO. 3

Produces the normal deviate Z corresponding to a given lower
tail area of P; Z is accurate to about 1 part in 10**16.

(original fortran code used PARAMETER(..) for the coefficients
and provided hash codes for checking them...)
*/
if (fabs(q) <= .425) {/* 0.075 <= p <= 0.925 */
r = .180625 - q * q;
val = q * (((((((r * 2509.0809287301226727 +...
So since the algorithm was published in 1988 at least this code seems it was developed in the early 90s rather than the 60s. Nonetheless, pay attention that 85% of the time you will get your answer back by just checking one if statement.

But anyway, since you also say

 Quote by andrewr an "if" statement changes the flow of the machine -- the CPU runs around 40 times slower than normal. A single *if* statement can be very expensive, time wise.... The main thing I need to do is calculate a single exact formula to produce the Gaussian -- with no *if* statements included in the algorithm. These techniques you mention are all very good (I'm an EE, and although I do hardware -- I'm *very* familiar with these issues.)
That surprises me because I thought that if statements where suppose to be computationally cheap. OK, it was long long ago in a far far away galaxy when I finished my degree in Computer Science and I had my hands on Assembler delving in the CPU architecture, and since you are an EE and you know your hardware I cannot really discuss deep into it, but I did the following little experiment in my Intel Atom (yeah, I know, but that's what I can afford now) with this code in a high level language (Ruby, my cup of tea join with C)

So I am going to calculate the number 5000000 comparing 1 million additions of 0.5 vs 5000000 additions of 1 within 1 million if statements, these are the results:

Code:
x = 0; 10000000.times{|i| x+=0.5 } #time 5.388s
x = 0; 10000000.times{|i| x+=1 if i%2 == 0} #time 4.236s
So the if statement not only doesn't slow the calculation but actually makes it around 20% faster! So I am not sure why you want to avoid the if statement so badly. Are you sure it has such a dramatic influence in your machine?

 Quote by andrewr I really do want to stay away from machine architecture dependent coding. That means I can't achieve the "highest" performance -- but it also means I can share the code with anyone; I know from hardware design experience that the relative timing of my machine's math co. hold for many other processors as well. Sin and Cos don't appear to be implementable by any method I know which will be faster using polynomials, than just using the built-in function. Log(), however, could easily be replaced -- and I have some experimental (non-polynomial) formulas that might be well worth curve fitting.... I'm learning a bunch about how a uniform random plugged into a formula produces a pdf! eg: f( u1 ) --> a pdf of --> d/dx f^-1( x ). This appears to be key in designing a more easily computed Gaussian generator. There is much that I am studying in BoxMuller, some of which is just for knowledge -- other parts explorations for speed-up. One fairly interesting path is to consider sin and cos: Since a function expressed as sin() is at 90 degrees to one expressed with cos(); I was thinking it may be possible to use this relationship to compute an anti-function of something expressed as sin(f(x)).
You have the instruction fsincos in assembler, which actually eliminates redundancies as I explained before, but since I am not sure the instruction is found in every architecture I guess you might not want to use it. But if you are using sin I'd say the cheapest would be using cos as well, any other high level trick seems to me is not going to be faster than that.

 Quote by andrewr exp(-0.5*x**2) is slow to compute; but when I graph acos( exp( -0.5*x**2 ) ), The resulting graph appears to be a hyperbola of some kind that would be *VERY* easily computed using only multiplication, division, and addition. Hence, it would be around 2.5 times faster than computing the exponential itself, to compute the cosine of a different function... (And it may be *MUCH* faster than computing the anti-exponential directly as BoxMuller does...!) Does anyone happen to know what the formula of the hyperbola is for acos( exp(...) ) ?
I agree with haruspex it does not look anything like a hyperbola.

 Quote by andrewr Hence, the MPM method ought to be on the Order of 15 times fasterthan BM. But it's nowhere *NEAR* that much faster -- it is, in fact, only *slightly* faster.
Maybe you want to check this The Monty Python Method for Generating Random Variables from master Marsaglia. You may have a look at his code and try again.

 Quote by viraltux OK, I admit you are are fun and you caught my curiosity, so I downloaded the code and checked qnorm.c and I found this: Code: /*-- use AS 241 --- */ /* double ppnd16_(double *p, long *ifault)*/ /* ALGORITHM AS241 APPL. STATIST. (1988) VOL. 37, NO. 3 Produces the normal deviate Z corresponding to a given lower tail area of P; Z is accurate to about 1 part in 10**16. (original fortran code used PARAMETER(..) for the coefficients and provided hash codes for checking them...) */ if (fabs(q) <= .425) {/* 0.075 <= p <= 0.925 */ r = .180625 - q * q; val = q * (((((((r * 2509.0809287301226727 +... So since the algorithm was published in 1988 at least this code seems it was developed in the early 90s rather than the 60s.
That may have been when the particular piece of code was published but that may not be when it was developed. Fortran code was being phased out by 1985 -- I still learned it as an EE ... either way, much of the code is older (and there is nothing wrong with old code that is well written...).

 Nonetheless, pay attention that 85% of the time you will get your answer back by just checking one if statement. But anyway, since you also say ... That surprises me because I thought that if statements where suppose to be computationally cheap. OK, it was long long ago in a far far away galaxy when I finished my degree in Computer Science and I had my hands on Assembler delving in the CPU architecture, and since you are an EE and you know your hardware I cannot really discuss deep into it, but I did the following little experiment in my Intel Atom (yeah, I know, but that's what I can afford now) with this code in a high level language (Ruby, my cup of tea join with C) So I am going to calculate the number 5000000 comparing 1 million additions of 0.5 vs 5000000 additions of 1 within 1 million if statements, these are the results: Code: x = 0; 10000000.times{|i| x+=0.5 } #time 5.388s x = 0; 10000000.times{|i| x+=1 if i%2 == 0} #time 4.236s So the if statement not only doesn't slow the calculation but actually makes it around 20% faster! So I am not sure why you want to avoid the if statement so badly. Are you sure it has such a dramatic influence in your machine?

I take it ruby is an interpreted language? There might be something in the interface...
I have no idea why doing more work would take less time... ! Your example is absolutely baffling...!

Now, in Ruby, since you have nothing after the if statement -- am I understanding correctly -- the syntax to mean only add 1 when the index is even? eg: do nothing when odd? (I'm not very familiar with Ruby... I have done more PHP do be honest.)

BUT: I'm sure that my code isn't a factor of 10x + faster; which in theory is ought to be... Of course, I didn't account for the random generator's percent of processing time -- I was just expecting a much quicker response than I got!; Hence I clearly made a mistake *somewhere* ; (P.S. my unsure nature is why I asked if anyone saw a mistake in my code)

In any event:
The issue is called "branch prediction". If the processor has hardware smart enough to decide which path is *more* likely, then the penalty only occurs on the occasional time where the prediction is wrong. The rest of the time, the "if" statements are *very* cheap.

Supposedly (I haven't checked the comment made by a friend) Intel processors keep some kind of statistic to predict which is more likely on the next pass. That includes the Atom.

However, even if my code were somehow wrong -- if your example is portable into C as your cup of tea (are YOU sure!) -- then my code ought to have been *much* *much* *much* ******MUUUUCHH********** faster !

 You have the instruction fsincos in assembler, which actually eliminates redundancies as I explained before, but since I am not sure the instruction is found in every architecture I guess you might not want to use it. But if you are using sin I'd say the cheapest would be using cos as well, any other high level trick seems to me is not going to be faster than that.
Yes, and I think that shows up on many other architectures -- I need to check the Arm processors, as that is one I'd like to run a calculator on in the future (not quite an Android app... but close).

However, Even if I just got the speedup from using sin() vs. using exp() -- I'd be quite happy!.
It's quick enough....

 I agree with haruspex it does not look anything like a hyperbola.
*sigh* Yes ... it doesn't when graphed negative and positive; but haruspex was bright enough to figure out what I meant -- eg: it resembles one when graphed from 0 to positive.

I have come across an almost identical shape before in another project I did some time ago, and it ended up being a ratio of something like (this is going to be wrong... of course) x/(3+x); There may have been square terms in it, etc. I don't remember ... but there was something that caused the shape change at zero.

I really am going to have to come back to that problem once I find my backup of the earlier work -- hopefully that has the answer I am seeking. It will be a good estimate, regardless ...

I do find it irritating I can't just solve the problem analytically; I mean Euler's identity has the exponent in it already: cos( t ) + i*sin( t ) = exp( i*t ) -- so converting from exp to cos() or sin() just shouldn't be that hard. !

I am sure I have seen an exact answer to this problem in a CRC handbook or something. It wasn't a very complicated expression.... But, I don't see how to solve it analytically myself.

If anyone does KNOW the exact answer -- please share, please.... ;) (Even if it ISNT a hyperbola..!)

 Maybe you want to check this The Monty Python Method for Generating Random Variables from master Marsaglia. You may have a look at his code and try again.
Well, I'll certainly benchmark his for speed against mine using the same random generator; and I'll do a plot of the output to see how it compares against the bell curve; but it will be a few days before I have time...
If his is faster and already has the tail solved correctly -- it might just be better to use his code and optimize the random generator. (I have a potential trick for that.... !)

But:
I have another issue I'm working out analytically, that is more important to solve first
(next post.)

Thanks so much for the link; that may be good enough.
 Ok, I mentioned earlier that the product of Gaussians -- aren't Gaussians. In this first part of the problem, I am interested in solutions where the Gaussians being multiplied aren't correlated. This problem comes up quite often in error propagation formulas; and so is interesting to study. As an example, I attached two such operations; appropriately labeled -- and they both obviously result in non-Gaussians. A typical reference (USA/NIST) indicates that for multiplication statistics one ought to use: $$s_{area }= \sqrt { width^{2} \times s _ {length} ^ {2} + length ^ {2} \times s _ {width} ^ 2 }$$ http://www.itl.nist.gov/div898/handb...ion5/mpc55.htm The note says that this particular formula ignores correlation -- which (since there is none) -- is irrelevant to my problem. BUT -- When I try to solve the problem analytically, I get a different formula than shown in NIST. NIST also list an exact formula, which includes correlation -- and is a solution I don't understand. It appears to have many terms that I wouldn't know exactly what they are.... If anyone can show if it reduces to the other formula in absence of correlation, I'd be interested. I looked around for other listings of error propagation formulas ... and I generally come across this same formula shown in NIST. eg: In Stower's research for medical applications... research.stowers-institute.org/efg/Report/PropagationOfErrors.pdf And, in typical papers I find elsewhere... there is one alternate variation I often see: http://www.physics.uc.edu/~bortner/l...tion%20htm.htm All they are doing in the alternate form (really) is converting to ratios or percentages, and then doing the same propagation formula as NIST does. However, that variation of the formula also will give a divide by zero for the problem shown in attachment #2... which is actually worse... So, what is *really* wrong here? :) I *SERIOUSLY* can't find what I did wrong when attempting to derive NIST's formula; and this is just the simple case.... ugh! In order to solve the equation, I am going to assume the normals are in a special form -- with standard deviation always equal to one. It makes for less variables during the calculus, and reduces the odds I will make a mistake.... *sigh*. So, start with two normals N(a,1) and N(b,1), and solve for the product... In this post, I'm going to solve for the mean; If no one sees a mistake, I'll then solve for the deviation. Begin: The probability of each multiplication element is: $$p(X,Y)={ 1 \over 2\pi } ( e^{-{ 1 \over 2}(x-a)^2} e^{-{ 1 \over 2}(y-b)^2} )$$ The elements of the mean for multiplication are: $$\mu_{e} ( X,Y )=xy$$ Which makes the weighted mean's elements: $$p(X,Y) \mu_{e} ( X,Y ) = { 1 \over 2\pi } xy( e^{-{ 1 \over 2}(x-a)^2} e^{-{ 1 \over 2}(y-b)^2} )$$ Combining yields: $$\mu = \int \limits _{ - \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty }{ 1 \over 2 \pi } xy( e^{-{ 1 \over 2 }(x-a)^2} e^{-{ 1 \over 2 }(y-b)^2} ) dxdy$$ Changing to polar coordinates to allow fixed radii vs probability pairs: $r^{2}=(x-a)^{2}+(y-b)^{2}$ and $x-a=cos( \theta )r$ and $y-b=sin( \theta )r$ $$dxdy=r \times d \theta dr$$ $$\mu = \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } { 1 \over 2 \pi } (cos( \theta )r+a)(sin( \theta )r+b) e^{-{ 1 \over 2}r^2} (r \times d \theta dr)$$ $$\mu = \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } { 1 \over 2 \pi } (sin( \theta )cos( \theta )r^{2} + cos(\theta)br + sin( \theta )ar + ab) e^{-{ 1 \over 2}r^2} d \theta dr$$ All of the integrated trigonometric terms will have the same value at 0 and $2 \pi$ and may be safely ignored. $$\mu = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } \left . (ab \theta) \right | _{0}^{ 2 \pi } re^{-{ 1 \over 2}r^2} \times dr$$ $$\mu = ab \int \limits_{ 0 }^{ \infty } r e^{-{ 1 \over 2}r^2} \times dr$$ $$\mu = ab$$ So, when multiplying N(5,2) * N (3,7); it will be the same as 2*7*(N(5/2,1),N(3/7,1) And that would yield: 5/2*3/7 = 15/14 * 2*7 = 15. In general, then, the mean is always the product of the original means for uncorrelated Gaussians. That, everyone seems to agree to ... but did I mess up or is this, "so far, so good?" Attached Thumbnails

 Quote by andrewr I take it ruby is an interpreted language? There might be something in the interface... I have no idea why doing more work would take less time... ! Your example is absolutely baffling...! Now, in Ruby, since you have nothing after the if statement -- am I understanding correctly -- the syntax to mean only add 1 when the index is even? eg: do nothing when odd? (I'm not very familiar with Ruby... I have done more PHP do be honest.) BUT: I'm sure that my code isn't a factor of 10x + faster; which in theory is ought to be... However, even if my code were somehow wrong -- if your example is portable into C as your cup of tea (are YOU sure!) -- then my code ought to have been *much* *much* *much* ******MUUUUCHH********** faster !
Ported to C

Code:
void main(int argc, char *argv[]) {
double x = 0;
for(int i=0; i<100000000; i++){ x += 0.5; } /* time  1.244s */
for(int i=0; i<100000000; i++){ if (i%2==0) {x += 1;}} /* time 0.904s */
printf("x is %f\n", x);
}
Now I am sure... The if statement, you gotta love it

 I do find it irritating I can't just solve the problem analytically; I mean Euler's identity has the exponent in it already: cos( t ) + i*sin( t ) = exp( i*t ) -- so converting from exp to cos() or sin() just shouldn't be that hard. !
Well, it's not, just use the expi (for imaginary numbers) function if you don't want to use fsincos. But if you're interested in the real part I'd say the sin, cos is not going to help you.

 I am sure I have seen an exact answer to this problem in a CRC handbook or something. It wasn't a very complicated expression.... But, I don't see how to solve it analytically myself. If anyone does KNOW the exact answer -- please share, please.... ;) (Even if it ISNT a hyperbola..!)
The point is that you can already approximate efficiently exp with ratio of polynomials, and you're looking for a ratio of polynomials that calculates asin(exp) then applying sin again... does this makes sense?

 If his is faster and already has the tail solved correctly -- it might just be better to use his code and optimize the random generator. (I have a potential trick for that.... !)
Bring it on!

 Quote by andrewr Ok, I mentioned earlier that the product of Gaussians -- aren't Gaussians. In this first part of the problem, I am interested in solutions where the Gaussians being multiplied aren't correlated. This problem comes up quite often in error propagation formulas; and so is interesting to study. As an example, I attached two such operations; appropriately labeled -- and they both obviously result in non-Gaussians. A typical reference (USA/NIST) indicates that for multiplication statistics one ought to use: $$s_{area }= \sqrt { width^{2} \times s _ {length} ^ {2} + length ^ {2} \times s _ {width} ^ 2 }$$ http://www.itl.nist.gov/div898/handb...ion5/mpc55.htm The note says that this particular formula ignores correlation -- which (since there is none) -- is irrelevant to my problem. BUT -- When I try to solve the problem analytically, I get a different formula than shown in NIST. NIST also list an exact formula, which includes correlation -- and is a solution I don't understand. It appears to have many terms that I wouldn't know exactly what they are.... If anyone can show if it reduces to the other formula in absence of correlation, I'd be interested.
I think the NIST article is a bit misleading, the general formula is not for the variance of a product of two variables, but for the square of the standard error for the product of the means of two samples. Pay attention to the n; when it goes to infinity the standard error goes to zero. So this formula does not measure the variance of a product as it claims.

You can learn to derive the general formula for the product of to variables with this paper:

http://www.jstor.org/discover/10.230...47699110527507

In the general expression derived in this paper you can see that when X and Y are not correlated many terms disappear and the remaining expression is the one you post here: $s_{area }= \sqrt { width^{2} \times s _ {length} ^ {2} + length ^ {2} \times s _ {width} ^ 2 }$

 Now I am sure... The if statement, you gotta love it
Yes,
I do understand why it's faster now; I didn't notice before that the predicate of the if statement was an integer. So, you are doing one integer operation or one floating point operation; although modulus is often two operations (division, multiplication, subtraction) -- I think that integer division has a remainder in assembly...

If you switch the variable i, to a float or double, then your code has the same characteristics as mine does. The predicate of the float is a float...

But just getting a rough estimate of the speed of the *if* with your loop (ignoring the for loop itself, as that is not a true if statement but a repeat statement in assembly -- perfect branch prediction) I am pretty sure my earlier estimate of the pipeline delay breaking (which is based on intel's own data, not measurement) doesn't explain why my code is so much slower than expected....

Hmmm....

 Well, it's not, just use the expi (for imaginary numbers) function if you don't want to use fsincos. But if you're interested in the real part I'd say the sin, cos is not going to help you.
I'll have to come back to this..... hold that thought for a while...
Yes, your ideas *do* make sense.

 Quote by viraltux I think the NIST article is a bit misleading, the general formula is not for the variance of a product of two variables, but for the square of the standard error for the product of the means of two samples. Pay attention to the n; when it goes to infinity the standard error goes to zero. So this formula does not measure the variance of a product as it claims. You can learn to derive the general formula for the product of to variables with this paper: http://www.jstor.org/discover/10.230...47699110527507 In the general expression derived in this paper you can see that when X and Y are not correlated many terms disappear and the remaining expression is the one you post here: $s_{area }= \sqrt { width^{2} \times s _ {length} ^ {2} + length ^ {2} \times s _ {width} ^ 2 }$
OK, and when I scan through the paper -- the initial steps do verify that he takes the mean to always be the simple product of means as well...

And I see the next step for computing a variance is also the same as mine.... but then he uses simple algebra; and although I used calculus -- I should have gotten the same answer -- and I just double checked, and I still can't find my mistake.

To make it worse, my answer actually agrees more with my posted experiments ... which could just be be a fluke....

I wonder, is there some kind of assumption in Goodman that I missed?
This is my path to it:

The variances for multiplication are:

$$\sigma ^{2} ( X,Y )=( xy - ab )^{2}$$

And the weighted variances:

$$p(X,Y) \sigma^{2} (X,Y) = { 1 \over 2\pi }( xy - ab )^{2} ( e^{-{ 1 \over 2}(x-a)^2} e^{-{ 1 \over 2}(y-b)^2} )$$

$$\sigma ^{2} = \int \limits _{ - \infty }^{ \infty } \int \limits_{ - \infty }^{ \infty }{ 1 \over 2 \pi } (xy-ab)^2( e^{-{ 1 \over 2 }(x-a)^2} e^{-{ 1 \over 2 }(y-b)^2} ) dxdy$$

Changing to polar coordinates to allow fixed radii vs probability pairs:

$r^{2}=(x-a)^{2}+(y-b)^{2}$ and $x-a=cos( \theta )r$ and $y-b=sin( \theta )r$ and $dxdy=r \times d \theta dr$

$$\sigma ^{2} = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } ((cos( \theta )r+a)( sin( \theta )r+b) - ab)^{2} e^{-{ 1 \over 2}r^2} (r \times d \theta dr)$$

Expanding the xy substitution inside the parenthesis, factoring common terms...

$$\sigma^{2} = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } (sin( \theta )cos( \theta )r^{2} + cos(\theta)br + sin( \theta )ar + ab -ab)^2 re^{-{ 1 \over 2}r^2} d \theta dr$$

$$\sigma^{2} = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } ( sin( \theta )cos( \theta )r + ( cos(\theta)b + sin( \theta )a ) )^2 r^3e^{-{ 1 \over 2}r^2} d \theta dr$$

$$\sigma^{2} = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } \int \limits _{0}^{ 2 \pi } g( r, \theta ) r^3e^{-{ 1 \over 2}r^2} d \theta dr$$

Expanding the trigonometric terms' square.

$$g( r, \theta ) = sin^2( \theta )cos^2( \theta )r^2 + 2sin( \theta )cos^2( \theta )br + 2sin^2( \theta )cos( \theta )ar + (cos^{2}( \theta )b^{2} + 2cos( \theta )sin( \theta )ab + sin^{2}( \theta )a^{2})$$

Integrating the expansion with respect to $\theta$, r being constant.

$$\int \limits _{ 0 }^{ 2 \pi }g( \theta, r )d \theta =$$

$$\left . \left ( {(4 \theta - sin(4 \theta)) \over 32}r^{2} - { 2 \over 3 } cos^3( \theta )br + {2 \over 3} sin^{3}( \theta )ar + { \theta + sin( \theta )cos( \theta ) \over 2 }b^2 - {cos( 2 \theta ) \over 2}ab+{ \theta - sin( \theta )cos( \theta ) \over 2} a^2 \right ) \right | _{0}^{2 \pi }$$

Any trigonometric term will have the same value at 0 and $2 \pi$ and cancel, leaving.

$$\int \limits _{ 0 }^{ 2 \pi }g( \theta )d \theta = \left . \left ( { \theta \over 8 }r^{2} + { \theta \over 2 }b^2 +{ \theta \over 2} a^2 \right ) \right | _{0}^{2 \pi } = 2 \pi \left ( {r^{2} \over 8} + { a^{2} \over 2 } + { b^{2} \over 2 } \right )$$

Now, Back-substituting the result...

$$\sigma^{2} = { 1 \over 2 \pi } \int \limits_{ 0 }^{ \infty } 2 \pi \left ( {r^{2} \over 8} + { a^{2} \over 2 } + { b^{2} \over 2 } \right ) r^3e^{-{ 1 \over 2}r^2} dr$$

$$\sigma^{2} = { 1 \over 8 } \int \limits_{ 0 }^{ \infty } r^5e^{-{ 1 \over 2}r^2} dr + { a^{2} + b^{2} \over 2 } \int \limits_{ 0 }^{ \infty } r^3e^{-{ 1 \over 2}r^2} dr$$

$$\sigma^{2} = { 1 \over 8 } \left . \left ( -e^{-{ x^2 \over 2 }}(x^4 + 4x^2 + 8 ) \right ) \right | _{ 0 }^{ \infty } + { a^{2} + b^{2} \over 2 } \left . \left ( -e^{-{ 1 \over 2 }x^{2} }( x^2 + 2 ) \right ) \right | _{ 0 }^{ \infty }$$

$$\sigma^{2} = { 1 \over 8 } \left ( 0 - -(1)(0+0+8) \right ) + { a^{2} + b^{2} \over 2 }( (0) - (-2) )$$

$$\sigma^{2} = 1 + \left ( { a^2 + b^2 } \right )$$

Taking the final result, and extending it to values with arbitrary stds;

$$N( a, \sigma _ {a} ) \times N( b, \sigma _ {b} ) = \sigma _ {a}\sigma _ {b} \times N( { a \over \sigma _ {a} }, 1 ) \times N( { b \over \sigma _ {b} }, 1 )$$

$$N( a, \sigma _ {a} ) \times N( b, \sigma _ {b} ) \rightarrow \mu = ab , \sigma = \sqrt { ( \sigma _ {a} b )^{2} + ( \sigma _ {a}\sigma _ {b} )^{2} + ( a \sigma _ {b} )^{2} }$$

So, my formula only differs in one term from that posted on NIST and all over the internet...

SO HERE'S Andrew's megalomania version.... It's me against the world (and likely to loose...! again...)

I get,
$$\sigma _ {width \times length} = \sqrt{ width ^ {2} \times \sigma _ {length} ^ {2} + \sigma _ {length} ^ {2} \times \sigma _ {width} ^ {2} + length ^ {2} \times \sigma _ {width} ^ {2} }$$

Oy!, and so how did I blow it this time? (I thought I was getting better at the calculus up and until now...)

 Quote by andrewr Yes, I do understand why it's faster now; I didn't notice before that the predicate of the if statement was an integer. So, you are doing one integer operation or one floating point operation; although modulus is often two operations (division, multiplication, subtraction) -- I think that integer division has a remainder in assembly... If you switch the variable i, to a float or double, then your code has the same characteristics as mine does. The predicate of the float is a float...
Well, but obviously it depends on the condition, the more calculations in the condition the slower is going to get! My point is that the poor old if statement by itself is not to blame.

 Quote by andrewr But just getting a rough estimate of the speed of the *if* with your loop (ignoring the for loop itself, as that is not a true if statement but a repeat statement in assembly -- perfect branch prediction) I am pretty sure my earlier estimate of the pipeline delay breaking (which is based on intel's own data, not measurement) doesn't explain why my code is so much slower than expected....
But anyway, one thing you can try is to change whatever conditions you have for something as simple and predictable as 1==1 and, this way, you can measure how much your conditions slow the program. At least you'll know how much blame to can place on that.

 Quote by andrewr So, my formula only differs in one term from that posted on NIST and all over the internet... SO HERE'S Andrew's megalomania version.... It's me against the world (and likely to loose...! again...) I get, $$\sigma _ {width \times length} = \sqrt{ width ^ {2} \times \sigma _ {length} ^ {2} + \sigma _ {length} ^ {2} \times \sigma _ {width} ^ {2} + length ^ {2} \times \sigma _ {width} ^ {2} }$$ Oy!, and so how did I blow it this time? (I thought I was getting better at the calculus up and until now...)
You're right!!! The Internet is wrong!!!!

Well, after NIST getting wrong the second formula I should not have trusted the first either!! And now NIST goes out of my bookmarks and, yesss, there it goesssss.

So I went back to check Goodman's formula cause I could not see anything wrong in your development, and there it was, the extra term that you've just calculated is there in Goodman's formula! to be precise is $E(Δx^2Δy^2)$ which happens not to be zero when X,Y are independent unlike the rest of the terms that go away in that situation!

Oh my God... if you cannot trust random guys in the Internet who're you gonna trust now.

 Quote by viraltux Well, but obviously it depends on the condition, the more calculations in the condition the slower is going to get! My point is that the poor old if statement by itself is not to blame.
You may be right; but my point was that the if statement is the only thing left *unless* there is a mistake in my code someplace;

perhaps I ought to have counted the instructions in detail before making the comment....

In reality, I ought to re-measure *everything* -- and re-test; but I am hesitant to do so -- as fixing the random generator needs to be done first to avoid wasting a lot of time -- and that's why I haven't yet.

 But anyway, one thing you can try is to change whatever conditions you have for something as simple and predictable as 1==1 and, this way, you can measure how much your conditions slow the program. At least you'll know how much blame to can place on that.
Oh, how much bliss would be my ignorance...

No, I really can't -- if I set it to 1==1, the gcc compiler will remove the "if" altogether -- and since it takes the "if" to determine the percentage of execution of each path -- such a change must at least include a calculation that allows me to estimate the performance hit... I can only remove fixed computations, such as that for exp().

There are, inside intel's processor, several hardware optimizations that overlap;
there is out-of-order execution of instructions, pipeline-delay, cache-delay, and math co-processor de-coupling. If I had made the hardware myself, it would be easy to weigh which is causing what -- but even professional software engineers tell me that the published specifications do not always agree with performance. eg: some were attempting to make an optimized movie player for Intel -- only to find that the optimized version ran slower than the un-optimized version because the processor did not optimize in the same way the intel engineers said it would.

I would like to think that *none* of these are causing the problem -- however, I have no alternate explanation at this time and it doesn't look to be simple. (But I wouldn't be surprised if it was just a dumb mistake of mine or a transcription error made much earlier...)

(P.S. Considering we have optimizations uncertainties in the loop -- *MULTIPLIED* by other uncertainties -- what formula ought I use to calculate the range (product) of the uncertainties? )

Here's the code I wrote, it's short enough to inspect right here:
PHP Code:
 double MontyPythonGaussianCore(void) { // A rapid partitioning of the bell curve into optimized computation areas  // that can be additively re-maped for fast generation of a normal distribution  // The program as shown does not have tail correction.... // And hasn't been carefully debugged; test it before using it... // // See: // http://www.physicsforums.com/attachment.php?attachmentid=48387&d=1339819476 // http://www.physicsforums.com/showthread.php?t=611422 // static const double spin  =0.966804869573191; static const double spin2 =0.966804869573191*2.0; // insure compute once...  static const double iTop  =1.59576912160573;      // 1.0 / sqrt(pi/8)     double z = Random01()*2.;     double u1, lowerN, upperN, ebar;     if (z<=spin) return z; // Area region A1 always returns...     if (z>spin2) { // Tail region computation for sure... GO recursive!         return  spin2 + MontyPythonGaussianCore()*0.609743713386175;         // A correction ought to go here, but I'm ignoring it for now.     }     // Otherwise, it must be in regions A2,A3,error.     // And requires a "y" point.     u1 = Random01();     // Compute an estimated bound -- and if too close -- compute exactly     ebar   = 0.008; // maximum polynomial error is 0.8% (<0.5% avg).     lowerN =((0.27*z)-1.57009)*z+2.26458;     // The following if operation could be re-ordered with the return's     // depending on which is faster, a caluculation -- or a cache miss...     if (u1      <= lowerN) return z;     if (u1-ebar <= lowerN) {         lowerN = exp(-0.5*z*z) * iTop; // Compute the exact value.         if (u1<=lowerN) return z; // barely below/on the curve of A2.      }      // At this point, the data *must* be above curve a2.       // curves e0, and a3 are mirrored...      z = spin2 - z;      ebar=0.00246; // Maximum polynomial error is 0.246% (<~0.15% avg.)      upperN  =  (( -0.2922*z + 0.9374)*z - 0.0171)*z + 0.40485;            if (u1        > upperN) return z;      if (u1 + ebar > upperN) {         upperN = 2.0 - exp( -0.5*z*z ) * iTop;         if (u1 >= upperN) return z;      }      // There is no other possibilty, the value is an e0      // Return it as a tail region data point...            return spin2 + z*1.02803051369197; }  

The first "*if*" statement returns ~48% of the time; A true multiply is not needed since a multiply by 2.0 is the same as an exponent adjust of +1; So -- This code *OUGHT* to be reaaaally fast. I'd guestimate -- 1% of BM time or better...

So, in order for the whole algorithm to be only slightly faster than BM (say 80% of BM's time as a random figure...), we would need 1% * 0.48 + x%*0.52 = 80%;

So, that's about x% = 152% ; eg: I would have to be slower than BM by 50% on average... ?!

Considering that BM uses the log operation at 80+ seconds (not counting the rest of the algorithm)/billion ops. And I am only using very short polynomials at less than 2s/add or multiply -- I find it surprising that my code is even on the same order of magnitude...

Scan through the code, and do an estimate of what you think the execution time ought to be on average.... or better; could you isolate the specific thing in the code which you suspect is causing the problem -- and I'll replace that with some kind of dummy statement and see what happens ?

Cheers...

 Quote by viraltux You're right!!! The Internet is wrong!!!!
ME?!

 Well, after NIST getting wrong the second formula I should not have trusted the first either!! And now NIST goes out of my bookmarks and, yesss, there it goesssss.
Now look what I caused...

I do understand ... but ... but ...
That's mildly disturbing ... Everything from the medical research community, the universities (and especially the physics departments) all the way to the USA's political parties appear to be using that formula from NIST;

.... and it's not quite right in the head, eh?

 So I went back to check Goodman's formula cause I could not see anything wrong in your development, and there it was, the extra term that you've just calculated is there in Goodman's formula! to be precise is $E(Δx^2Δy^2)$ which happens not to be zero when X,Y are independent unlike the rest of the terms that go away in that situation! Oh my God... if you cannot trust random guys in the Internet who're you gonna trust now.
I like interactive human beings rather than static web pages... they can be talked with...
I'd write to NIST -- but I doubt they even read their e-mail...

hmmm......
I'm looking again at the page you linked to from Jstor; They give the first page to read -- and then the rest is purchase only.

At 8 pounds, it looks to be rather expensive to just randomly read many articles from them. When the authors mention "a sketch of specializations and applications", do you know what they are referring to?

I am puzzling over what happened a bit, and am interested in the other terms...
Jstor breaks it into three lines, the first line is identical to what I wrote for un-correlated; the remaining two lines, then are for the correlated cases.

Now, it's obvious that when there is no correlation -- that anything multiplied by a correlation constant of zero, goes away.... Third line goes *poof*.
Q1: In the Jstor article, is the covariance constant a Pearson value, or something else?

But the second line is not so obvious; This is why I had to trust what you said regarding what did and didn't go away.... (I'm humble, I've made my 10 mistakes already...)

I am thinking that it goes to zero because the expectation value of Δx is by definition zero, and the same is true of delta y.

I showed in the first part of my proof that for uncorrelated Gaussians, the mean of a product is the product of the means.

But, once we have a term like Δy * Δy , THAT is NO longer a Gaussian, and so I am no longer sure why: E( Δx Δy**2 ) = 0; Assuredly E( Δx * Δy ) WOULD be zero; but I don't know about the combination.

What do you think, Is it going to be true in general, or is it because there are two partial products that interact?

E(x)*E( Δx Δy**2 ) + E(y)*E( Δx Δy**2 ) = 0.

In my own graphed examples...
I was noticing skew (and kurtosis?) in the final products of Gaussians (previous post), and I am also interested in knowing if there are generalized / simple ways to determine how skewed or how much kurtosis one can expect from a particular multiplication....

 Quote by andrewr There are, inside intel's processor, several hardware optimizations that overlap; there is out-of-order execution of instructions, pipeline-delay, cache-delay, and math co-processor de-coupling. If I had made the hardware myself, it would be easy to weigh which is causing what -- but even professional software engineers tell me that the published specifications do not always agree with performance. eg: some were attempting to make an optimized movie player for Intel -- only to find that the optimized version ran slower than the un-optimized version because the processor did not optimize in the same way the intel engineers said it would. I would like to think that *none* of these are causing the problem -- however, I have no alternate explanation at this time and it doesn't look to be simple. (But I wouldn't be surprised if it was just a dumb mistake of mine or a transcription error made much earlier...) (P.S. Considering we have optimizations uncertainties in the loop -- *MULTIPLIED* by other uncertainties -- what formula ought I use to calculate the range (product) of the uncertainties? )
You asked this right before reading my latest post, right?

But yeah, I myself learn techniques to evaluate how efficient code should be and blah, blah but, at the end of the day, turns out that hardware, compiler, settings end up having such impact that "non-optimal" code turns out to be the fastest one. As one of my favorite wisdoms goes "Theory is better than practice in theory, but practice is better that theory in practice" (I think I already posted this wisdom in this thread... geee I'm growing old)

 The first "*if*" statement returns ~48% of the time; A true multiply is not needed since a multiply by 2.0 is the same as an exponent adjust of +1; So -- This code *OUGHT* to be reaaaally fast. I'd guestimate -- 1% of BM time or better...
I think I spotted something;

You should place conditions in a way that you check first the most likely scenario, so in second place instead if (z>spin2) which barely happens you should check if (z<=spin2) for regions A2,A3,error... so check the order of the other if statements too and make sure that likely comes first.

One comment, you use exp(-0.5*z*z) when if (z>spin) and if (z<=spin2) so roughly in [1..2], then you transform z = spin2 - z; so roughly again z is now in [0..1] and then, again -0.5*z*z is in [-0.5..0] so you calculate exp([-0.5..0]). So since you know this instead approximating exp() you should only need approximate exp within [-0.5..0].

In a parallel thread I asked about ways to approximate functions with ratios of polynomials and someone mentioned: Pade Aproximant Turns out that there are books with chapters dedicated only on how to approximate the exp function with this kind of ratios. Will this be faster than just using exp? I don't have that feeling but, anyway, there you go.

 Quote by andrewr ME?!
Yeah! How you like that!

 I do understand ... but ... but ... That's mildly disturbing ... Everything from the medical research community, the universities (and especially the physics departments) all the way to the USA's political parties appear to be using that formula from NIST;

First time I didn't see the gov in the URL and I just thought it was a random guy just copying the wrong formula but... reaaaaaaally? So many people rely in this formula? That's worrying because actually the error makes the variance over optimistic, so if some engineer out there is using this to assess risk in the device he is designing we are in trouble!! I mean, really?

Physics departments too? Well, they say this year the LHC might announce the discovery of the Higgs boson, if they are using this formula somewhere you can be a hell of party popper "Yeah... LHC? Yeah, Andrew speking, I was bored and then I discovered the Internet is wrong so, yeah, since you already made a clown of yourselves with the faster than light neutrino thingy I just thought I might spare you some further embarrassment.... Have a nice day"

 At 8 pounds, it looks to be rather expensive to just randomly read many articles from them. When the authors mention "a sketch of specializations and applications", do you know what they are referring to?
No idea, I just saw the development for Goodman's calculation and posted it without having a second look and without checking if NIST was right because, hey, they are NIST! I only went back when you said your calculations where rendering something different.

 But the second line is not so obvious; This is why I had to trust what you said regarding what did and didn't go away.... (I'm humble, I've made my 10 mistakes already...) I am thinking that it goes to zero because the expectation value of Δx is by definition zero, and the same is true of delta y. I showed in the first part of my proof that for uncorrelated Gaussians, the mean of a product is the product of the means. But, once we have a term like Δy * Δy , THAT is NO longer a Gaussian, and so I am no longer sure why: E( Δx Δy**2 ) = 0; Assuredly E( Δx * Δy ) WOULD be zero; but I don't know about the combination. What do you think, Is it going to be true in general, or is it because there are two partial products that interact? E(x)*E( Δx Δy**2 ) + E(y)*E( Δx Δy**2 ) = 0.
I think you are letting the Δ symbol scare you away, everything looks more "mathy" and scary with Greek symbols so now let's take Δx=A and Δy**2 = B. We know that, when A and B are independent, we have E(AB) = E(A)E(B) (The paper proves it generally but actually you proved yourself this one too for a Gaussian), so if E(A)=0 you know for sure E(AB)=0. So, since E(Δx)=E(Δy)=0 every time they appear everything goes away.

 Quote by viraltux Yeah! How you like that! First time I didn't see the gov in the URL and I just thought it was a random guy just copying the wrong formula but... reaaaaaaally? So many people rely in this formula? That's worrying because actually the error makes the variance over optimistic, so if some engineer out there is using this to assess risk in the device he is designing we are in trouble!! I mean, really?
I haven't been able to find anyone without searching for "Goodman" specifically, who is publishing the right formula. They may exist, of course -- searches are just about commonly found items. But, even Wikipedia and a dozen other information gatherer's don't seem to notice the problem at all.

http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Their formula is:
$$f = AB$$
$$\left(\frac{\sigma_f}{f}\right)^2 = \left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2 + 2\frac{\sigma_A\sigma_B}{AB}\rho_{AB}$$

So, if pair A,B is *un-correlated*, $\rho_{AB}$ is zero ... this formula looses product of deviations term, as well.

Assuming Goodman is right (and I agree with him, and NIST actaully cites him...but doesn't quote him correctly...) -- then, the problem seems to show up where the technique of using partial derivatives to compute the error propagation formula is used; eg: as is shown in the Wikipedia article section "Non-linear Combinations".

What do you think? I'm not so sure this is important for many typical applications, as the error term is going to be a small fraction of the total standard deviation; but I'd be interested in looking at LHC's work to see what formula they do use... (Not that I'd correct them if they were wrong ... I'd just like to see if they use it at all.)

CERN's site, though, doesn't appear to have direct links to open papers showing their methods.

I do like sites like ARXIV, as they allow/encourage public inspection of their work ... I wish there were more establishments that did that.