Expectation value for CDF fails

andrewr · Jun 30, 2012

viraltux said:

You're right! The Internet is wrong!

ME?!

Well, after NIST getting wrong the second formula I should not have trusted the first either! And now NIST goes out of my bookmarks and, yesss, there it goesssss.

Now look what I caused...

I do understand ... but ... but ...
That's mildly disturbing ... Everything from the medical research community, the universities (and especially the physics departments) all the way to the USA's political parties appear to be using that formula from NIST;

... and it's not quite right in the head, eh?
o:)

So I went back to check Goodman's formula cause I could not see anything wrong in your development, and there it was, the extra term that you've just calculated is there in Goodman's formula! to be precise is E(Δx^2Δy^2) which happens not to be zero when X,Y are independent unlike the rest of the terms that go away in that situation!

Oh my God... if you cannot trust random guys in the Internet who're you going to trust now.

I like interactive human beings rather than static web pages... they can be talked with...
I'd write to NIST -- but I doubt they even read their e-mail...

hmmm...
I'm looking again at the page you linked to from Jstor; They give the first page to read -- and then the rest is purchase only.

At 8 pounds, it looks to be rather expensive to just randomly read many articles from them. When the authors mention "a sketch of specializations and applications", do you know what they are referring to?

I am puzzling over what happened a bit, and am interested in the other terms...
Jstor breaks it into three lines, the first line is identical to what I wrote for un-correlated; the remaining two lines, then are for the correlated cases.

Now, it's obvious that when there is no correlation -- that anything multiplied by a correlation constant of zero, goes away... Third line goes *poof*.
Q1: In the Jstor article, is the covariance constant a Pearson value, or something else?

But the second line is not so obvious; This is why I had to trust what you said regarding what did and didn't go away... (I'm humble, I've made my 10 mistakes already...)

I am thinking that it goes to zero because the expectation value of Δx is by definition zero, and the same is true of delta y.

I showed in the first part of my proof that for uncorrelated Gaussians, the mean of a product is the product of the means.

But, once we have a term like Δy * Δy , THAT is NO longer a Gaussian,

and so I am no longer sure why: E( Δx Δy**2 ) = 0; Assuredly E( Δx * Δy ) WOULD be zero; but I don't know about the combination.

What do you think, Is it going to be true in general, or is it because there are two partial products that interact?

E(x)*E( Δx Δy**2 ) + E(y)*E( Δx Δy**2 ) = 0.

In my own graphed examples...
I was noticing skew (and kurtosis?) in the final products of Gaussians (previous post), and I am also interested in knowing if there are generalized / simple ways to determine how skewed or how much kurtosis one can expect from a particular multiplication...

viraltux · Jul 1, 2012

andrewr said:

There are, inside intel's processor, several hardware optimizations that overlap;
there is out-of-order execution of instructions, pipeline-delay, cache-delay, and math co-processor de-coupling. If I had made the hardware myself, it would be easy to weigh which is causing what -- but even professional software engineers tell me that the published specifications do not always agree with performance. eg: some were attempting to make an optimized movie player for Intel -- only to find that the optimized version ran slower than the un-optimized version because the processor did not optimize in the same way the intel engineers said it would.

I would like to think that *none* of these are causing the problem -- however, I have no alternate explanation at this time and it doesn't look to be simple. (But I wouldn't be surprised if it was just a dumb mistake of mine or a transcription error made much earlier...)

(P.S. Considering we have optimizations uncertainties in the loop -- *MULTIPLIED* by other uncertainties -- what formula ought I use to calculate the range (product) of the uncertainties? )

You asked this right before reading my latest post, right?

But yeah, I myself learn techniques to evaluate how efficient code should be and blah, blah but, at the end of the day, turns out that hardware, compiler, settings end up having such impact that "non-optimal" code turns out to be the fastest one. As one of my favorite wisdoms goes "Theory is better than practice in theory, but practice is better that theory in practice" (I think I already posted this wisdom in this thread... geee I'm growing old)

The first "*if*" statement returns ~48% of the time; A true multiply is not needed since a multiply by 2.0 is the same as an exponent adjust of +1; So -- This code *OUGHT* to be reaaaally fast. I'd guestimate -- 1% of BM time or better...

I think I spotted something;

You should place conditions in a way that you check first the most likely scenario, so in second place instead if (z>spin2) which barely happens you should check if (z<=spin2) for regions A2,A3,error... so check the order of the other if statements too and make sure that likely comes first.

One comment, you use exp(-0.5*z*z) when if (z>spin) and if (z<=spin2) so roughly in [1..2], then you transform z = spin2 - z; so roughly again z is now in [0..1] and then, again -0.5*z*z is in [-0.5..0] so you calculate exp([-0.5..0]). So since you know this instead approximating exp() you should only need approximate exp within [-0.5..0].

In a parallel thread I asked about ways to approximate functions with ratios of polynomials and someone mentioned: Pade Aproximant Turns out that there are books with chapters dedicated only on how to approximate the exp function with this kind of ratios. Will this be faster than just using exp? I don't have that feeling but, anyway, there you go.

viraltux · Jul 1, 2012

andrewr said:

ME?!

Yeah! How you like that!

I do understand ... but ... but ...
That's mildly disturbing ... Everything from the medical research community, the universities (and especially the physics departments) all the way to the USA's political parties appear to be using that formula from NIST;

First time I didn't see the gov in the URL and I just thought it was a random guy just copying the wrong formula but... reaaaaaaally? So many people rely in this formula? That's worrying because actually the error makes the variance over optimistic, so if some engineer out there is using this to assess risk in the device he is designing we are in trouble! I mean, really?

Physics departments too? Well, they say this year the LHC might announce the discovery of the Higgs boson, if they are using this formula somewhere you can be a hell of party popper

"Yeah... LHC? Yeah, Andrew speking, I was bored and then I discovered the Internet is wrong so, yeah, since you already made a clown of yourselves with the faster than light neutrino thingy I just thought I might spare you some further embarrassment... Have a nice day"

At 8 pounds, it looks to be rather expensive to just randomly read many articles from them. When the authors mention "a sketch of specializations and applications", do you know what they are referring to?

No idea, I just saw the development for Goodman's calculation and posted it without having a second look and without checking if NIST was right because, hey, they are NIST! I only went back when you said your calculations where rendering something different.

But the second line is not so obvious; This is why I had to trust what you said regarding what did and didn't go away... (I'm humble, I've made my 10 mistakes already...)

I am thinking that it goes to zero because the expectation value of Δx is by definition zero, and the same is true of delta y.

I showed in the first part of my proof that for uncorrelated Gaussians, the mean of a product is the product of the means.

But, once we have a term like Δy * Δy , THAT is NO longer a Gaussian, and so I am no longer sure why: E( Δx Δy**2 ) = 0; Assuredly E( Δx * Δy ) WOULD be zero; but I don't know about the combination.

What do you think, Is it going to be true in general, or is it because there are two partial products that interact?

E(x)*E( Δx Δy**2 ) + E(y)*E( Δx Δy**2 ) = 0.

I think you are letting the Δ symbol scare you away, everything looks more "mathy" and scary with Greek symbols so now let's take Δx=A and Δy**2 = B. We know that, when A and B are independent, we have E(AB) = E(A)E(B) (The paper proves it generally but actually you proved yourself this one too for a Gaussian), so if E(A)=0 you know for sure E(AB)=0. So, since E(Δx)=E(Δy)=0 every time they appear everything goes away.

andrewr · Jul 3, 2012

viraltux said:

Yeah! How you like that!

First time I didn't see the gov in the URL and I just thought it was a random guy just copying the wrong formula but... reaaaaaaally? So many people rely in this formula? That's worrying because actually the error makes the variance over optimistic, so if some engineer out there is using this to assess risk in the device he is designing we are in trouble! I mean, really?

I haven't been able to find anyone without searching for "Goodman" specifically, who is publishing the right formula. They may exist, of course -- searches are just about commonly found items. But, even Wikipedia and a dozen other information gatherer's don't seem to notice the problem at all.

http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Their formula is:
f = AB
\left(\frac{\sigma_f}{f}\right)^2 = \left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2 + 2\frac{\sigma_A\sigma_B}{AB}\rho_{AB}

So, if pair A,B is *un-correlated*, \rho_{AB} is zero ... this formula looses product of deviations term, as well.

Assuming Goodman is right (and I agree with him, and NIST actaully cites him...but doesn't quote him correctly...) -- then, the problem seems to show up where the technique of using partial derivatives to compute the error propagation formula is used; eg: as is shown in the Wikipedia article section "Non-linear Combinations".

What do you think? I'm not so sure this is important for many typical applications, as the error term is going to be a small fraction of the total standard deviation; but I'd be interested in looking at LHC's work to see what formula they do use... (Not that I'd correct them if they were wrong ... I'd just like to see if they use it at all.)

CERN's site, though, doesn't appear to have direct links to open papers showing their methods.

I do like sites like ARXIV, as they allow/encourage public inspection of their work ... I wish there were more establishments that did that.

viraltux · Jul 4, 2012

andrewr said:

I haven't been able to find anyone without searching for "Goodman" specifically, who is publishing the right formula. They may exist, of course -- searches are just about commonly found items. But, even Wikipedia and a dozen other information gatherer's don't seem to notice the problem at all.

http://en.wikipedia.org/wiki/Propagation_of_uncertainty

Their formula is:
f = AB
\left(\frac{\sigma_f}{f}\right)^2 = \left(\frac{\sigma_A}{A}\right)^2 + \left(\frac{\sigma_B}{B}\right)^2 + 2\frac{\sigma_A\sigma_B}{AB}\rho_{AB}

So, if pair A,B is *un-correlated*, \rho_{AB} is zero ... this formula looses product of deviations term, as well.

Assuming Goodman is right (and I agree with him, and NIST actaully cites him...but doesn't quote him correctly...) -- then, the problem seems to show up where the technique of using partial derivatives to compute the error propagation formula is used; eg: as is shown in the Wikipedia article section "Non-linear Combinations".

I am listening to the X files theme tune while reading this... it's just appropriate.

Well, Wikipedia got it nearly right because first they say:

Any non-linear function, ''f(a,b)'', of two variables, ''a'' and ''b'', can be expanded as: f\approx f^0+\frac{\partial f}{\partial a}a+\frac{\partial f}{\partial b}b

which is true but then they forget the approximate symbol from that moment on.

So the article should use the ≈ sign for every no-linear case and explain that for the special case ab we have Goodman's exact formula. The discoverer maybe should do the honors and fix it, but if you don't I'll do it myself; I am a big supported of Wikipedia philosophy and I'd like it to have it right.

What do you think? I'm not so sure this is important for many typical applications, as the error term is going to be a small fraction of the total standard deviation; but I'd be interested in looking at LHC's work to see what formula they do use... (Not that I'd correct them if they were wrong ... I'd just like to see if they use it at all.)

CERN's site, though, doesn't appear to have direct links to open papers showing their methods.

I am not civil engineer or EE, so I don't know in which scenarios this formula is applied, but every time σ is big compared to a,b then the approximation is horrible. But anyway, scenarios with big uncertainties can be found a lot in Astronomy though I don't know how often they deal with non-linear scenarios, and so it goes for the LHC at CERN, if they're dealing with big uncertainties in this non-linear scenario then their results are going to be way over-optimistic.

And just as appropriate as the X file music for this part of the thread is this today's piece of news:

http://www.reuters.com/article/2012/07/04/us-science-higgs-idUSBRE86008K20120704

Geeee... what a timing!

I do like sites like ARXIV, as they allow/encourage public inspection of their work ... I wish there were more establishments that did that.

True, when I end up in a site with the 10€ per paper deal I have the feeling you got to be millionaire to do research. The only thing I don't like about Arxiv is that they are too restrictive to whom they allow to publish, I know they do it for quality's sake but, instead censorship, they should implement a network of trust tool; this way you only see papers from people you trust and, people they trust to the level you want; just like the network of trust schema for encryption keys.

Unfortunately I haven't found any relevant site like that, so... yeah.

andrewr · Jul 4, 2012

viraltux said:

I am listening to the X files theme tune while reading this... it's just appropriate.

Well, Wikipedia got it nearly right because first they say:
which is true but then they forget the approximate symbol from that moment on.

So the article should use the ≈ sign for every no-linear case and explain that for the special case ab we have Goodman's exact formula. The discoverer maybe should do the honors and fix it, but if you don't I'll do it myself; I am a big supported of Wikipedia philosophy and I'd like it to have it right.

I use them enough that I donate to their fundraisers. One of these days I need to create an account and learn the process of helping them edit their articles...
Go ahead and fix it yourself... If you want to give clues on how you did that, I'd appreciate knowing... Do you edit the live article, or do you make a corrected draft and have it peer reviewed by other volunteers?

I am not civil engineer or EE, so I don't know in which scenarios this formula is applied, but every time σ is big compared to a,b then the approximation is horrible. But anyway, scenarios with big uncertainties can be found a lot in Astronomy though I don't know how often they deal with non-linear scenarios, and so it goes for the LHC at CERN, if they're dealing with big uncertainties in this non-linear scenario then their results are going to be way over-optimistic.

In engineering, most items are macroscopic; but an extreme application might be the estimation of a some property of a the space of a transistor; in such an app, there could be around 20 atoms in diameter (eg: the Atom processor transistors, anyone?) and an atom's outer electron wave packet may extend out 3 to 5 atoms...

so; we might expect sigma to be on the order of 1/6 to 1/4 of a or b. If it were 1/4, then relatively speaking; that gives -- sqrt( 1(1/4)**2 + (1/4*1/4)**2 + 1(1/4)**2 / ...
about 1.5% error...

The error, does, of course grow rapidly and becomes horrible as one approaches one atom... (EE's *ARE* having trouble getting down that small ;) )

Geeee... what a timing!

Yes, cern (I'm sure) has reason for concern...
QM packets for single particles are rather, well, a large percentage of the measured value...
But they also have a staff of Mathematicians, I'd bet...

FYI: I will get back to the random generator...
I just have priorities with solving the analytical problems first; but I definitely want to make a good test generator...

I'm about ready to start tackling correlation, kurtosis, and skew...

Those uglies show up whenever multiplication of Gaussians happen, even when the muliplicands are not correlated, as I mentioned in the graphs at the bottom of post 23.

https://www.physicsforums.com/showpost.php?p=3977418&postcount=23

When Goodman produces a result, he is simply giving us the variance and mean of the operation from Gaussians; but as we know, the results themselves are NOT Gaussians...

The purpose of NIST (National Institute of Standards and Technology) is to encourage cooperation (national and international) by setting up/maintaining common and useful standards for measurement and result reporting.

NIST recommends (and even for international cooperation) that once data is collected; the result ought to always be reported (and potentially distorted/corrected) such that the results can be used *as if* they were Gaussian variables.

Since that's the case, I'm interested in a calculator/set of functions which manipulate Gaussians and do the corrections/distortions necessary to preserve the reporting format of a Gaussian.

Ultimately, I'd like to build a calculator which uses the Cyhelsky Skew idea to carry around a pair of Gaussians that can simulate the properties of non-Gaussian distributions. The calculator would carry these values around internally, and use them to improve the precision of intermediate calculation steps -- but always report a "best" Gaussian as the result.

We know that a calculator can approximate functions by polynomials; Since addition of Gaussians always produces Gaussians, addition and subtraction are not a problem, but multiplication *is* -- So, my goal is really making some kind of correction to multiplication results during intermediate calculator steps...

If I just pretend the output of Goodman's formula is a Gaussian, I'm going to get pretty bad results in the next Gaussian operation I perform. So, Here's what I am thinking in General: If I add a near-Gaussian to a true Gaussian, the result will be closer to a true Gaussian than the original (Central limit theorem). Often it is *MUCH* closer to a Gaussian...!

So, what I am thinking is that I could take the output of multiplication, (M), and then add it to an arbitrary but true un-correlated Gaussian (X) (doing so either element by element or using Calculus); Then save the final mu, and the sigma, of the result; This result can be used to back-compute a true Gaussian that will simulate the results of addition better than the sigma and mu of M would in the first place...

So, I'm thinking: set m, s = mu of ( M + X ), sigma of (M + X );
Then, set Mn = N( mu,sigma ) - X.

Then Mn can be used in subsequent calculations with better results.
What do you think, is this a decent approach -- or are there other simpler approaches already in use in the general mathematical community that you are aware of?

viraltux · Jul 5, 2012

andrewr said:

I use them enough that I donate to their fundraisers. One of these days I need to create an account and learn the process of helping them edit their articles...
Go ahead and fix it yourself...

That's all right, I will, but before I go on commenting with the rest of your post I just let you know that I got a threat of a ban from a mentor. Just telling you in case I happen to decay into photons

OK?

viraltux · Jul 5, 2012

andrewr said:

If you want to give clues on how you did that, I'd appreciate knowing... Do you edit the live article, or do you make a corrected draft and have it peer reviewed by other volunteers?

Yes you do edit live, but articles are under surveillance of the authors or the people that know about the subject, when someone changes the articles they immediately get a email warning about the change and they disagree they can revert the change immediately.

When disagreement happens then you talk about it and the discussion remains in the talk section, that is why is a good idea to check that section when an article is flag as biased (usually those with political content)

andrewr said:

When Goodman produces a result, he is simply giving us the variance and mean of the operation from Gaussians; but as we know, the results themselves are NOT Gaussians...

The purpose of NIST (National Institute of Standards and Technology) is to encourage cooperation (national and international) by setting up/maintaining common and useful standards for measurement and result reporting.

NIST recommends (and even for international cooperation) that once data is collected; the result ought to always be reported (and potentially distorted/corrected) such that the results can be used *as if* they were Gaussian variables.

Since that's the case, I'm interested in a calculator/set of functions which manipulate Gaussians and do the corrections/distortions necessary to preserve the reporting format of a Gaussian.

Ultimately, I'd like to build a calculator which uses the Cyhelsky Skew idea to carry around a pair of Gaussians that can simulate the properties of non-Gaussian distributions. The calculator would carry these values around internally, and use them to improve the precision of intermediate calculation steps -- but always report a "best" Gaussian as the result.

We know that a calculator can approximate functions by polynomials; Since addition of Gaussians always produces Gaussians, addition and subtraction are not a problem, but multiplication *is* -- So, my goal is really making some kind of correction to multiplication results during intermediate calculator steps...

If I just pretend the output of Goodman's formula is a Gaussian, I'm going to get pretty bad results in the next Gaussian operation I perform. So, Here's what I am thinking in General: If I add a near-Gaussian to a true Gaussian, the result will be closer to a true Gaussian than the original (Central limit theorem). Often it is *MUCH* closer to a Gaussian...!

So, what I am thinking is that I could take the output of multiplication, (M), and then add it to an arbitrary but true un-correlated Gaussian (X) (doing so either element by element or using Calculus); Then save the final mu, and the sigma, of the result; This result can be used to back-compute a true Gaussian that will simulate the results of addition better than the sigma and mu of M would in the first place...

So, I'm thinking: set m, s = mu of ( M + X ), sigma of (M + X );
Then, set Mn = N( mu,sigma ) - X.

Then Mn can be used in subsequent calculations with better results.
What do you think, is this a decent approach -- or are there other simpler approaches already in use in the general mathematical community that you are aware of?

Well, I am not sure I understand the problem you are trying to solve here, one thing is to apply transformations to data so that it looks to us like something familiar e.g. a Normal distribution; we do that all the time, and I assume this is what NIST means. Right? So for instance, if X follows a Normal distribution with μ>>0 and μ>>σ, and you have Z = X^2 then NIST would recommend you don't share Z but rather SQRT(Z) explaining the transformation you did, in this case SQRT. Am I right on what NIST asks?

But then it seems like if you were trying to find a way to calculate the most similar Normal distribution to something which is not Normal in which case... this is weird; for instance, no matter how you tune μ and σ you will never get anything close to an Exponential distribution.

But anyway, If you want to know how close are two distributions there is a family of probability distribution metrics that will tell you exactly that (e.g. Fortet-Mourier metric) If this is what you want you can use the metric d(Mn,M) to find the μ and σ that minimize it; This way you will have the Normal distribution Mn that would explain better M.

Now, if you want the fanciest of techniques for this then you go with those using Mixtures of Distributions, which basically are methods that use overlapped distributions. You can integrate this overlapping in a Kalman Filter model (I think EE work with these to treat noise in signals) and have a very nice approximation of the products of Normal distributions by just using Normal distributions overlapped in the filter.

Sooo.. yeah.

andrewr · Jul 14, 2012

viraltux said:

Well, I am not sure I understand the problem you are trying to solve here, one thing is to apply transformations to data so that it looks to us like something familiar e.g. a Normal distribution; we do that all the time, and I assume this is what NIST means. Right? So for instance, if X follows a Normal distribution with μ>>0 and μ>>σ, and you have Z = X^2 then NIST would recommend you don't share Z but rather SQRT(Z) explaining the transformation you did, in this case SQRT. Am I right on what NIST asks?

No, not quite (Though I may be under-reading them) ; They are being very simplistic. They only want the mu and sigma of a data-set reported. Period.
For example 31.412(5) would have a mu of 31.412 and a sigma of 0.005. (The parenthesis references the significance of the ls-digit..., and represents 1 sigma)

NIST does allow one to report a two sigma mark, etc, using special notation -- but essentially, it's the same thing -- a mere number. One would report the mu and sigma of Z in your example; and consider what operations the client would perform on it -- given that, we could adjust mu and sigma that we report in order to reduce the error the client will experience using our number.

But then it seems like if you were trying to find a way to calculate the most similar Normal distribution to something which is not Normal in which case... this is weird; for instance, no matter how you tune μ and σ you will never get anything close to an Exponential distribution.

Yes, quite true. However, I am operating only on "assumed" Gaussians reported by someone else; (Thankfully NOT exponenetial!) and the only operations I am performing on it are add subtract multiply and divide. For add and subtract, the operation does a convolution on the data elements making the result "look" more and more Gaussian.

Multiplication for items with σ << μ also happens to look very Gaussian...

However, multiplication and division which don't obey the above relationship introduce serious Kurtosis and increasing skew. (A problem...)

Inside my calculator, I only wish to improve the result of doing a series of multiplications and additions compared to the standard of assuming pure Gaussians at every step.

If I wanted to do sin( statistic ), then I have an issue about propagation of error; but as we have seen, the typical industry standard covariance approach is missing an important term... It is generally accurate for typical operations, but it isn't robust and totally fails in some. I don't want a total failure, *ever*; just graceful degradation...

More complicated (typical) operations like computing sin( statistic ) can be done using a polynomial, with coefficients having NIST style values nnnnn(ssss); and then, the basic add subtract multiply divide operations of the hypothetical better calculator will keep track of the error and automatically by +.-,*,/ , properly tracing the error regardless of input; it keeps track, also, of the truncation error :) making the approximation polynomial conservatively accurate (a worthy goal...)

I am not interested in getting "exact" results, but I do want a more robust calculation with improved estimation on all *intermediate* steps inside the calculator -- eg: than if I were to assume pure Gaussians are the result of every operation. BUT -- I'd like a *simple* and robust improvement, and not an extremely fancy and complicated one.

But anyway, If you want to know how close are two distributions there is a family of probability distribution metrics that will tell you exactly that (e.g. Fortet-Mourier metric) If this is what you want you can use the metric d(Mn,M) to find the μ and σ that minimize it; This way you will have the Normal distribution Mn that would explain better M.

Yes, that seems correct; that is basically what I want to do -- although, it isn't always a pure Gaussian I want to convert to; eg: I do want the calculator to handle basic skew and possibly Kurtosis... but only in a very limited way, during intermediate steps.

I'll look up Forted-Mourier later, but I do want your opinion on the basic idea I had earlier and will explain here...

Now, if you want the fanciest of techniques for this then you go with those using Mixtures of Distributions, which basically are methods that use overlapped distributions. You can integrate this overlapping in a Kalman Filter model (I think EE work with these to treat noise in signals) and have a very nice approximation of the products of Normal distributions by just using Normal distributions overlapped in the filter.

Sooo.. yeah.

The Kalman filter might be used in an application such as CNC trajectory control and statistical process control of said CNC. But, even there -- I don't tend to use it as there are more robust methods I developed for high speed CNC work -- eg: there are better ways to calibrate sensors, and use *much* faster computation techniques to avoid needing to worry about interpolating sample times. It's also quite complicated and computation intensive...

(Brag) -- using an older 3000 series Xilinx chip -- which was only a 70 MHz *effective* flip flop rate -- (ignore the 100MHz chip spec if you look it up -- as that's for an un-wired flip-flop!) -- but none the less, I was able to compute 128 bit wide real time trajectories in that chip at well over 16 times the speed of a 200MHZ Intel 486 processor could even compute it, let alone do IO after the computation (That CPU was the fastest of that time period.) The servo loop time of my chip was in the nano-second region. Interpolation using Kalman filters simply isn't necessary...! ;) And a 3000 series chip was only around $13 each.

Anyway, I'm not interested in anything so complicated -- I am using a mixture distribution already. Cyhelsky skew gives a relative measure of how much data is above a mean vs. below. That in turn, automatically relates the sigma on each side of the mean (one sided sigmas, don't know the proper name...) to the skew; and they become increasingly different as skew increases; The final result of a simple mixed distribution can trivially be converted to a single sigma Gaussian, which would be the exact same as if the distribution's data points were used without reference to the side of the mean which they were on.

What I did, is use half a bell curve for each side of the mean -- eg: with the sigma set different for each side. That allows the calculator to handle skew in a basic way -- *but* that method does introduce a discontinuity at the mean.
However, this mixed distribution *did* allow me to develop the mathematics to handle adding skewed distributions.

Using the numerical simulations, I have noticed that even the first or second operation (say addition) smears the discontinuity out very much. Eg: the convolution has that effect... which is very desirable.

The mathematics I developed happens to apply quite accurately to subsequent steps, and since each convolution due to addition causes the subsequent result to look *more* Gaussian, the accuracy does not degenerate with more steps (It beats the **** out of what using ONLY pure Gaussians can do.)

The representation is good enough for use right now for addition of Skewed data sets, although I'd like to replace the mixed distribution of half bell curves with one that is equivalent to that mixed distribution after a *single* operation with a true un-skewed Gaussian. The new representation would automatically smear out the discontinuity, and produce curves which more realistically representing typical data sets with tails going to zero, and a single mode.

Since multiplication tends to have almost the same curve (visually) as some of my mixture distributions after a *single* addition, I figure I can use this new mixture distribution as a basis for representing results of multiplications and divisions, internal to the calculator. (The only problem that stops me from finishing the calculator.)

So, all operations, inside the calculator, can be represented by a single data distribution type -- a mixed distribution based on a Cyhelsky skewed Gaussian...

The final result, of course, has to be reported in NIST format -- so I will need to convert an internal representation to an un-skewed Gaussian; I also need, internal to the calculator, to convert the output of multiplication to the "nearest" skewed Gaussian the calculator can represent . All conversions, then, have the same basic problem -- finding a reasonably "accurate" representation of any other arbitrary distribution.

So this is what I thought might be a good way to convert them ... It will, of course, convert a Gaussian back into the *same* Gaussian -- exactly -- so the calculator automatically performs the industry standard Gaussian error propagation formulas (with multiplication improved, appropriately of course!).

The conversion idea I have is this, the calculator's internal basis distribution is determined by three (or at most 4) statistics.

1 & 2) The Cyhelsky skew value, or equivalently -- both one sided sigmas
3) The mean
4) Potentially a Kurtosis correction value (not developed yet).

So, I thought -- ANY distribution, even if not the calculator's native representation; could be converted into those variables in this way:

Take any arbitrary distribution u; develop mathematics to add it to an uncorrelated skew Gaussian "x", and compute the resulting 4 statistics; ( or only TWO statistics if the result is to be a pure Gaussian...)

Then solve for an "x" such that the resulting sigma of "u+x"is exactly √2 that of x, and the mu of the result is exactly twice that of "x"; Skew result must remain exactly the same as x, and be a "Maximal" value, and Kurtosis, if developed, would also remain exactly that of "x", and be maximal.

If u *were* a true Gaussian, then x would have to be the same distribution with the *same* mu and sigma of "u".
Skew and Kurtosis can't be different from the Gaussian, or the final result will have a different Skew and Kurtosis than "x", and hence they must be zero...

For a "u" of any other distribution besides Gaussian...:
"x" tells us what skewed Gaussian will produce the same results as "u" when operated on by addition...

Since adding a Gaussian to a non-Gaussian, produces something *closer* to a Gaussian -- I conjecture the final result ought to converge toward a "perfect" Gaussian and be better than just naively accepting "u"'s statistics raw...

Now, I'll look up Fortet-Mourier, but I am curious what you think of my idea. (Saying it's bador flawed is fine... I'm just curious as always...)

andrewr · Jul 15, 2012

Conversion of one CDF into an approximation; Fortet-Mourier

I found this...

Fortet-Mourier or Wasserstein metric:
inf (|X-Y|)=\int _{ ℝ } | F _1 (x) - F _2 (x) | dx = \int \limits _{0} ^{1} | F _{1} ^{-1} (x) - F _{2} ^ {-1} (x) |

Where F1 and F2 are CDFs, eg: in my case, of an original and a converted distribution.

I'm not sure why the infimum is mentioned, but this metric appears to be the total probability of choosing a value that is in in error due to the conversion. So, minimizing this metric would match the distributions so that the total accumulated error of the conversion is minimum. It is more robust to outlier type errors, than say a metric based on the total variation of the error... and it is a decent starting point ...

; But I am pretty certain I need to use a metric that minimizes the error (in a particular sense) of the original distributions vs. the converted distribution *after* one of the four operators +,-,×,/ are applied to it and another value.

Since the result is just a number, I'm really wanting to minimize the error of a statistic measured in the converted distribution; eg: the standard deviation and mean for NIST purposes, and possibly the kurtosis and skew for my own records. It doesn't *really* matter to me how much the actual PDF/CDF of the conversion is in error because the actual distribution is not reported in the end.

So, what are your thoughts ? and, of course, others are welcome to answer as well.
I'm not trying to be exclusive... but just share ideas.

viraltux · Jul 16, 2012

andrewr said:

I found this...

Fortet-Mourier or Wasserstein metric:
inf ��(|X-Y|)=\int _{ ℝ } | F _1 (x) - F _2 (x) | dx = \int \limits _{0} ^{1} | F _{1} ^{-1} (x) - F _{2} ^ {-1} (x) |

Where F1 and F2 are CDFs, eg: in my case, of an original and a converted distribution.

I'm not sure why the infimum is mentioned...

The infimum makes sense when you consider the degrees of freedom given all possible X and Y with CDF F1 and F2, so if F1=F2 then there will be one X=Y and the infimum will be zero. But anyway, the scenario where I am familiar with these metrics (stochastic programming) we already have a particular X and we want the discretized version that better explains such X, so in this case the infimum would be with respect to the parameters of the discretized version.

For your problem you already have X too (it would be M in your example) and you want to find the Normal distribution (let's call it N) that better explains M, that is, to minimize d(M,N) and in this case the infimum will be with respect to the μ and σ of N. Once you have N you can use it instead M for your calculations and the metric will account for whatever kurtosis of skewness M might have, and thus, N should be your best shot to "Normalize" your data.

andrewr said:

But I am pretty certain I need to use a metric that minimizes the error (in a particular sense) of the original distributions vs. the converted distribution *after* one of the four operators +,-,×,/ are applied to it and another value.

I am not sure I understand the conversion you are trying to do and probably is because I don't quite understand the problem you're trying to solve, let's see, let's say you have the data X,Y,Z... and then you do all sort of operations H=X*Y+Z*Z*Z+X*Z+Z^2 and yet, all these operations are themselves another random variable H. Doing any intermediate step like conversion X*Y into a normal like distribution to then operate is a sin before the eyes of God.

I would just operate normally and if you need to "normalize" anything then do it with the final result H by minimizing d(H,N) and then returning the μ and σ of N.

But then you also say

NIST does allow one to report a two sigma mark

And now seems to me like if you don't even need to normalize anything! Simply return the μ of H and different σ for left and right if H happens to be skewed.

andrewr said:

Since the result is just a number, I'm really wanting to minimize the error of a statistic measured in the converted distribution; eg: the standard deviation and mean for NIST purposes, and possibly the kurtosis and skew for my own records. It doesn't *really* matter to me how much the actual PDF/CDF of the conversion is in error because the actual distribution is not reported in the end.

So, what are your thoughts ? and, of course, others are welcome to answer as well.
I'm not trying to be exclusive... but just share ideas.

Right, that's why I have the feeling I am missing something because I can't even see why you need the calculator at all. Do you have a numerical example? Something that shows where the problem is, how NIST asks you to report it, and how you are trying to solve the problem such procedure has? I think that might help everyone to jump in.

EDIT: In fact, we've been talking about so many different issues in this thread that people might feel they cannot really join in without reading the whole thing, maybe a good idea would be for you to open a new thread with this particular problem and then you'll have new blood and ideas on it.

andrewr · Jul 16, 2012

No, I don't have a numerical example. You would need to specify exactly what you would like to see -- it is impossible to generalize to all cases except using algebra.

But I simplified the question, and will add detail as people ask pertinent questions.
At least, I hope they do.

Simplified Q. is Here

I can link back to this thread only where apt., but starting a new thread is a good idea.

Expectation value for CDF fails

Similar threads

B A Little Probability Puzzle

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers