Improving propagation of error in successive calculations

andrewr · Jul 16, 2012

Hi,

When using data in calculations, errors can be introduced by the calculations themselves;
A common example of this kind of data is the atomic mass of some element; The data might be reported by NIST as: 190.233(20), which indicates that the mean is 190.233, and the maximum likelihood deviation is 0.020. Data from NIST is always assumed to be GAUSSIAN.

But doing operations on Gaussians (eg: ×,/ ) can produce resulting distributions that are not Gaussians:

png 1, example multiply

png 2, example multiply

Generally, people ignore this problem and just assume the result *is* still a Gaussian (not to mention that people typically don't use the exact formula for getting the sigma and mu of multiplication) Hence the error can compound rapidly as more operations are performed.

I'd like to do something a little better (but not much.), I'd like to create a single composite PDF/CDF that replaces the Gaussian and corrects somewhat for these introduced errors.
It is NOT practical to compute the exact CDF/PDF for a series of operations, but only for one or two. Hence, I want to design a single CDF/PDF (call it O) that can "optimize" the error.
Each operation's result would be measured (somehow) to produce the best fit parameters for a near-equivalent "O" distribution.

I am planning to keep only mu, sigma, skew, and possibly kurtosis at each step as definitive of a unique "O"

Example:
Given three uncorrelated Gaussians u,v; and and three "O" variables p o w
Now: consider the following calculations:
LET: o=u × v;
p=o + w, p=o × w, and p=o / w

I'd like the errors of the second set of operations to be about the same, and in some sense minimized. What metric would be appropriate to simultaneously minimize the errors of the second set of equations ? I can produce what I think is a decent CDF/PDF if necessary.

Stephen Tashi · Jul 17, 2012

andrewr said:

Hi,

The data might be reported by NIST as: 190.233(20), which indicates that the mean is 190.233, and the maximum likelihood deviation is 0.020.

Is the terminology "maximum likelihood deviation" used by NIST? What does it mean? Are you saying that 0.020 is the size of a certain number of standard deviations?

Generally, people ignore this problem and just assume the result *is* still a Gaussian

If you're talking about people who state various methods for error propagation I can believe that, since these methods vary greatly and often lack any mathematical justification. Probability theorists presumable know that the product and quotient of two independent gaussian random variables need not be a gaussian random variable.

(not to mention that people typically don't use the exact formula for getting the sigma and mu of multiplication) Hence the error can compound rapidly as more operations are performed.

In the case of the quotient of two gaussians, the first question to ask is if the variance and mean exist. (I don't know the answer offhand.) There are probability distributions that don't have means or variances. (The sample mean and variance will always exist so you might not detect this problem using simulations.)

I'd like to do something a little better (but not much.), I'd like to create a single composite PDF/CDF that replaces the Gaussian and corrects somewhat for these introduced errors.
It is NOT practical to compute the exact CDF/PDF for a series of operations, but only for one or two. Hence, I want to design a single CDF/PDF (call it O) that can "optimize" the error.
Each operation's result would be measured (somehow) to produce the best fit parameters for a near-equivalent "O" distribution.

You need to make the daydream more specific. Perhaps you mean that you want a family of distributions such that each member of the family is specifice by a values of 2 (or possibly 3 parameters) and such that the family is "closed" is in the following sense.

Using the notation X~ F(a,b) to mean "X is a random variable with distribution F(a,b)" and let the family of distributions be the set S.

If X and Y are independent random variables and X ~ F(a,b) and Y ~ G(c,d) and F and G are both elements of S then
1) XY has a distribution H(e,f) that is an element of S
2) X/Y has a distribution W(g,h) that is an element of S

And you want a method for computing (e,f) and (g,h) from (a,b) and (c,d).

What metric would be appropriate to simultaneously minimize the errors of the second set of equations ?

What do you mean by "errors". Errors in what?

My mind reading effort on that: Perhaps your main concern is the computation of the length of "confidence intervals" of a given quality (such as 0.99 = 99%). Perhaps by "error" you mean something that measures the difference between an actual interval and a interval calculated using distributions from the family S. That doesn't define "error" precisely, but it at least specifies that you are concerned about how two intervals differ.

andrewr · Jul 17, 2012

Stephen Tashi said:

Is the terminology "maximum likelihood deviation" used by NIST? What does it mean? Are you saying that 0.020 is the size of a certain number of standard deviations?

Yes, exactly 1.
The idea of maximum liklihood is essentially making an unbiased estimate of the true standard deviation; I don't think it necessary to go into furthur detail; That is mentioned not by NIST itself, but by ISO (International Standards Organization), and NIST uses it.
There are sub-hyperlinks here for independent study.

Engineering handbook, Type A evaluations

If you're talking about people who state various methods for error propagation I can believe that, since these methods vary greatly and often lack any mathematical justification. Probability theorists presumable know that the product and quotient of two independent gaussian random variables need not be a gaussian random variable.

But general Engineers don't -- and the omission is surprising since that is the target audience of the NIST handbooks, and the more detailed papers which contain the standard procedures. NIST is associated with keeping, for example, the exact length of 1 meter. etc.
They have to know these facts.
Notice, they reference Goodman, here, but don't clarify what and why something is an approximation and what is exact. A sample deviation is seldom "exact" -- and this formula from Goodman uses samples...

Error propagation Handbook
I derived the simple formula for the uncorrelated case as an exercise, and that's how I discovered the issue.
Re-derivation of Goodman's formula

In the case of the quotient of two gaussians, the first question to ask is if the variance and mean exist. (I don't know the answer offhand.) There are probability distributions that don't have means or variances. (The sample mean and variance will always exist so you might not detect this problem using simulations.)

In some divisions it doesn't; clearly if the mean of that one is dividing by is 0, there will be no mean after the division. I worked that out already.

You need to make the daydream more specific. Perhaps you mean that you want a family of distributions such that each member of the family is specifice by a values of 2 (or possibly 3 parameters) and such that the family is "closed" is in the following sense.

That's great -- but I figured it's also information overload -- esp. as I am not that advanced in the terminology. I'm not sure what you mean by daydream, though. A Gaussian is described as a single PDF -- but it is still a family of curves, isn't it?

Using the notation X~ F(a,b) to mean "X is a random variable with distribution F(a,b)" and let the family of distributions be the set S.

If X and Y are independent random variables and X ~ F(a,b) and Y ~ G(c,d) and F and G are both elements of S then
1) XY has a distribution H(e,f) that is an element of S
2) X/Y has a distribution W(g,h) that is an element of S

And you want a method for computing (e,f) and (g,h) from (a,b) and (c,d).

No; The result H(e,f) is not likely to be in set S. One has to choose something in S that is close distribution cdf/pdf wise -- and I am asking what kinds of metrics would be used to determine this closeness. I offered to provide a pdf/cdf as an example of set S; but I am open to other people's opinion as to what would constitute a robust set S. This is an empirical study.

What do you mean by "errors". Errors in what?

The result of each operation is a sigma, mu, skew, and possibly kurtosis. That's all I have mentioned.

Since H(e,f) is not in S -- locating something in S that is close to H(e,f) is going to introduce errors into sigma, mu, skew, and possibly kurtosis.

My mind reading effort on that: Perhaps your main concern is the computation of the length of "confidence intervals" of a given quality (such as 0.99 = 99%). Perhaps by "error" you mean something that measures the difference between an actual interval and a interval calculated using distributions from the family S. That doesn't define "error" precisely, but it at least specifies that you are concerned about how two intervals differ.

Mind reading? Please ask any more questions you need to. The handbook at NIST is also a good resource to understand the background of my problem.

I did mention it, and I thought an ISO standard would be recognized widely. My mistake?

chiro · Jul 17, 2012

With regard to division, you are going to get some crazy stuff for dividing two random variables if a zero (or that neighbourhood) is involved regardless of a zero mean: as long as you get significant probability in the region.

If you want to figure out the PDF given say X divided by Z there are a couple ways of doing this (and this will help you analyze your errors accumulate with regards to the number of variables you use).

The first way is based on transformations. For example in the X/Y distribution, it's better to find the distribution of 1/Y using transformation of random variables and then use results like moment generating functions and the characteristic equation in probability that converts a MGF to a PDF using a Fourier transform.

So the idea is you get the PDF of 1/Y using transformation principles, then you get the MGF of X/Y by calculating the MGF with respect to X/Y (so in other words e^(t[X/Y])f(x)g(y)dydx over the whole region of X and Y. Same kind of thing for calculating XY and its distribution.

For calculating X + Y where X,Y are any distribution with a known PDF and are not necessarily identical in distribution, but are definitely independent, then you use the convolution theorem to get the distribution of X and Y. this will take care of addition and subtraction (define -Y by taking the distribution and flipping it around the y-axis to get the distribution of the negative).

These four tools help you calculate any combination of final distributions using arithmetic of random variables.

Now it doesn't mean that it will always be easy because you will get some complex expressions, but the fact that they are all gaussian (i.e. the atomic random variables) makes it a bit better.

You can use numerical routines to calculate the final distributions, so don't spend too much time trying to do an analytic solution if it is consuming time.

Now finally we get to answering your question which is about errors in the distribution, and subsequently errors in things that depend on the distribution.

If you want to get an idea of how to check statistical significance of whether two distributions are 'the same' you use what is called a Goodness Of Fit test (GOF).

There are different ways of finding out how the error model that NIST gives vs the error model through your calculations is.

You can compare the expectation of the two distributions to get the average error which is a quick indicator, but the better approach is to highlight the ranges of values most likely to be used for a specific application and use those instead. You can do this through Bayesian Analysis, or you can simply re-create a new distribution taking the rest out (this is if the distribution doesn't reflect the values that will be used for your application: for example dividing by zero is not going to be common and the distribution of Y in X/Y should be different to begin with and not assume say a normal Gaussian).

Stephen Tashi · Jul 17, 2012

andrewr,

I didn't find the phrase "maximum liklihood deviation" in that link. When speaking of "standard deviation" or "probable error", it will be clearer to mathematicians if you use the term "standard deviation". (From a mathematical point of view, that NIST handbook is terrible. Even for a government document, it has a many wordy pages that have no significant content.)

As Chiro points says, the distribution of X/Y fails to have a mean or variance when X and Y are independent Gaussian random variables even in the case when the mean of Y is not zero. So you don't have any hope of finding these. To deal with X/Y you need a more realistic model of how experimental measurements are made. Perhaps it should reflect the fact that people discard measurements that appear absurd or that real lab instruments can only output a finite range of values.

What isn't clear to me about your quest is how it relates to any practical considerations. You state that your goal is to estimate the mean, variance and possibly the kurtosis of XY and X/Y. What type of decisions would be made based on knowing these? If we have a gaussian distribution then the mean and variance are most often used to make decisions affected by how often an observed value will be within a certain distance of the mean value. If we don't have a gaussian distribution, we can't use tables for the gaussian to estimate these probabilities. Shall we assume that the distribution of XY is shaped enough like a gaussian that we can use gaussian tables to approximate its probabilities?

andrewr · Jul 17, 2012

Stephen Tashi said:

andrewr,

I didn't find the phrase "maximum liklihood deviation" in that link. When speaking of "standard deviation" or "probable error", it will be clearer to mathematicians if you use the term "standard deviation".

Point noted:
I understood Maximum Likelihood, as mentioned somewhere in the notes on NIST -- likely in a sub-link, or regarding Type A, or Type B evaluations; to be the same as this:

http://en.wikipedia.org/wiki/Maximum_likelihood

So, a maximum liklihood deviation, is just a corrected unbiased estimate of standard deviation. eg:

The ISO guidelines are based on the assumption that all biases are corrected and that the only uncertainty from this source is the uncertainty of the correction. The section on type A evaluations of bias gives guidance on how to assess, correct and calculate uncertainties related to bias.

For future reference, ought I have said:
Just, "standard deviation" ? or should I have said "an arbitrary MLE of standard deviation?" or "Nists Type A evaluation?", which would have been most helpful?

I am thinking -- the latter two are more precise, but precision is problematic ?

(From a mathematical point of view, that NIST handbook is terrible. Even for a government document, it has a many wordy pages that have no significant content.)

That doesn't surprise me in the *slightest*. Unfortunately, darkness is the standard...

As Chiro points says, the distribution of X/Y fails to have a mean or variance when X and Y are independent Gaussian random variables even in the case when the mean of Y is not zero.

Yes, I wish I had said it better. I noticed that after I wrote it, but didn't have time to edit.
Sometimes it's just Easier to take the punishment... Sorry if you wasted some time, Chiro! But I am interested in how you approached the issue, and will spend time working through your comment.

So you don't have any hope of finding these. To deal with X/Y you need a more realistic model of how experimental measurements are made. Perhaps it should reflect the fact that people discard measurements that appear absurd or that real lab instruments can only output a finite range of values.

The production side of the issue and discarding of data is a "Type B" evaluation, and I'm not really given the option of controlling that.

I'm interested in *ARBITRARY* calculations performed on data, not so much the measurement itself. eg: I would simply take a value *given* by NIST and perform operations on that data.

I don't have a hope in finding the results of a calculation for an impossible situation -- I agree; but if the operation does represent something in the real world, and yet the calculation is impossible because a portion of the PDF produces invalid results -- then I would conjecture that the regions of the PDF where the operation (say division) is NOT invalid would represent what is really desired.

There are >= two choices, attempt an estimate based on the parts of the PDF which can be calculated (possibly with a warning), or simply report a failure in the calculation even if the result would be useful.

I think there is quite a bit of literature dealing with analogical problems -- eg: to remove outliers, for example: Peirce's annoying criterion, or Winsoring, or M-estimators. The ideal is impossible, but one can often guess a decent substitute.

I know when I do multiplication what kind of trend I cause in distorting the data; That may be useful for partial correction of results.

What isn't clear to me about your quest is how it relates to any practical considerations. You state that your goal is to estimate the mean, variance and possibly the kurtosis of XY and X/Y. What type of decisions would be made based on knowing these? If we have a gaussian distribution then the mean and variance are most often used to make decisions affected by how often an observed value will be within a certain distance of the mean value. If we don't have a gaussian distribution, we can't use tables for the gaussian to estimate these probabilities. Shall we assume that the distribution of XY is shaped enough like a gaussian that we can use gaussian tables to approximate its probabilities?

A sigma value, even if not for a Gaussian, is useful. If that's all I have as a result, then that's all I give.

NIST and ISO give recommendations in type A and type B evaluation papers talk about reporting, say, the 2,3,4, or even 5 sigma mark instead of a 1 sigma mark; eg: to allow pre-computed confidence intervals on data that isn't Gaussian, or rather -- isn't *close* to Gaussian. These techniques are to be avoided when possible...

In type A evaluations, the result is simply assumed (whether it truly is or not) to be a Gaussian. It is the user's responsibility to determine what is or is not safe to do with the results... (Good ol' government logic!), and it is the creator of the data's task to decide what aspect to report.

SO the answer to your question is: YES; we assume it is *near* enough to a Gaussian unless told otherwise, and given a help.

And, that is the best we *can* do.

andrewr · Jul 17, 2012

chiro said:

With regard to division, you are going to get some crazy stuff for dividing two random variables

Yes, and in multiplication too...as already plotted

See uncorrelated division plots attached. Most of divison's craziness is the same as for multiplication; Division is more extreme, but it behaves much in the same way. The divide by zero issues, typically don't happen even though they could. If in a sample of a million, the curve is still well defined even with zero at a significant percentage location -- the result is not unreasonable to use; Most people physically doing such an experiment wouldn't ever notice it either...!

If you want to figure out the PDF given say X divided by Z there are a couple ways of doing this...

Hmmm... and your focus exclusively on division, is perhaps because it is the most difficult?
I was able to work the pdf out analytically for multiplication, but I haven't tried division yet...

For calculating X + Y where X,Y are any distribution with a known PDF and are not necessarily identical in distribution, but are definitely independent, then you use the convolution theorem to get the distribution of X and Y. this will take care of addition and subtraction (define -Y by taking the distribution and flipping it around the y-axis to get the distribution of the negative).

Yes, I already did that, analytically. But Convolution was easier as a second try.

These four tools help you calculate any combination of final distributions using arithmetic of random variables.

Exactly; although a multiplication of a multiplication of a multiplication becomes impractical. esp: when the number of these in a row is arbitrary...
BUT: Using only +,-,*,/ -- I can implement all functions/operations of interest. :approve:

Now it doesn't mean that it will always be easy because you will get some complex expressions, but the fact that they are all gaussian (i.e. the atomic random variables) makes it a bit better.

Hmmm, I'm not sure why this is so difficult. I already did it for +,-,×, using an expression which is not a true Gaussian, but a mixed Gaussian representation that can represent skew well -- and I discovered that a sum of this particular representation even shows the ability to represent limited kurtosis well.

Division I haven't worked out -- so is it especially difficult?

Now finally we get to answering your question which is about errors in the distribution, and subsequently errors in things that depend on the distribution.

If you want to get an idea of how to check statistical significance of whether two distributions are 'the same' you use what is called a Goodness Of Fit test (GOF).

:zzz:
I Tried to ask once before, slightly differently... and
HOW will Fortet Morier help? GOF & GOOF off?

There are different ways of finding out how the error model that NIST gives vs the error model through your calculations is.

You can compare the expectation of the two distributions to get the average error which is a quick indicator, but the better approach is to highlight the ranges of values most likely to be used for a specific application and use those instead. You can do this through Bayesian Analysis, or you can simply re-create a new distribution taking the rest out (this is if the distribution doesn't reflect the values that will be used for your application: for example dividing by zero is not going to be common and the distribution of Y in X/Y should be different to begin with and not assume say a normal Gaussian).

Good points. I'm not sure how to answer these yet -- I am just experimenting with the idea, for example, using it in place of a significant figures calculator for some chemistry problems.
But, the actual application areas -- are any of the un-specialised undergraduate level problems in Physics, Chemistry, and Engineering. By the time master's level work is being done, they are going to want to make their own precise corrections; Most of the time, these problem areas we are encountering don't show up -- or cause below a 1% error; but not always... and I would like to at least detect when this is happening...

The only way I know to really answer the question you pose, it to actually use the idea -- and see how often it fails in practice.

What I am most concerned about avoiding, is an accumulation of easily corrected errors over several consecutive operations... Compound interest has (hopefully) a similar property a small correction early on -- can produce a *HUGE* difference in the total interest one has to pay.

The issue I'm discovering is that often one doesn't realize the problem exists when a graph of the PDF isn't available at each step of the calculation. Propagation of error formulas are completely blind...

Thanks Chiro, I still need help with the idea of a metric that simultaneously can be used to choose a good family of PDF's (set "S") over all four operators; or in measuring the one I already have developed somewhat.

But I think there is a subtlety in that idea which I am not doing well at explaining. I'll need to think about it a bit.

:redface:

chiro · Jul 17, 2012

I Tried to ask once before, slightly differently... and
HOW will Fortet Morier help? GOF & GOOF off?

Well the standard GOF measures are used for generating a hypothesis test concerning an observed histogram vs an expected histogram to either reject the hypothesis that the observed is not statistically significantly different from the expected, or to fail to reject that same hypothesis (with some level of statistical significance that can be made as high (or low) as you like).

The kinds of metrics you are referring to are a little different, and there are a variety of tests that are available depending on many factors like the distribution (there are for example, a variety of tests to see if a histogram is considered Gaussian for example: Shapiro-Wilk is one, but it is by no means the only one. You should run a search of goodness of fit for Gaussian/Normal distributions to see them all because testing for this assumption is at the centre of frequentist statistical methods which is why it has had a decent amount of exposure).

Stephen Tashi · Jul 18, 2012

andrewr said:

For future reference, ought I have said:
Just, "standard deviation" ? or should I have said "an arbitrary MLE of standard deviation?" or "Nists Type A evaluation?", which would have been most helpful?

What you ought to say depends on what you mean. "Standard deviation" itself is an ambiguous term.

Three main topics in statistics are distributions, statistics and estimators.

In the study of distributions, a distribution can have certain parameters, such as its standard deviation, which might also be called "the population standard deviation".The formal definition of "a statistic" is that it is a function of the values in a sample. Technically it is not a single numerical value (like 0.20) but people commonly use the term "statistic" to refer to a single numerical value of a statistic. In the context of samples, "standard deviation" means "sample standard deviation" or by common mis-use of the term, a particular value of the sample standard deviation. The definition of "sample standard deviation" is not standardized. Some books define it by a formula that uses a divisor of the sample size N and other use a divisor of N-1. It's an arbitrary choice.

Estimators are statistics used to estimate things - usually we estimate the parameters of a distribution. Since estimators are statistics they are functions. A particular value of an estimator should be called an "estimate" but people mis-use the terminology and say things like "The MLE estimator for the standard deviation is 0.20"). Estimators are random variables (as are all statistics) since they are functions of the values of random samples. As random variables, they have certain properties such as "maximum liklihood", "unbiased", "minimum variance", "least square error". There are different estimators for the population standard deviation and there is a definite relation between the formulas used and the properties of the estimator, so whether one divides by N,N-1 or N+1 in computing the standard deviation is not arbitrary if you intend the estimator to have certain properties. Virtualtux gave an interesting example of the estimators for the population variance in post #12 of the thread https://www.physicsforums.com/showthread.php?t=616643

(The NIST hanbook has words about assuming all bias is removed, but the MLE estimator of the variance is biased.)

haruspex · Jul 18, 2012

Stephen Tashi said:

In the context of samples, "standard deviation" means "sample standard deviation" or by common mis-use of the term, a particular value of the sample standard deviation. The definition of "sample standard deviation" is not standardized. Some books define it by a formula that uses a divisor of the sample size N and other use a divisor of N-1. It's an arbitrary choice.

No, it's not arbitrary. To get the s.d. of the sample you would always divide by N. You divide by N-1 to get the unbiased estimator of the s.d. of the underlying population. The reason is that you calculate the s.d. based on having estimated the mean of the population using the mean of the sample. Any error there will result in a smaller s.d. than if you knew the mean of the population and used that.
What's confusing is that e.g. Excel uses STDEVP to mean the first (divide by N) and STDEV to mean the second. The way to read this, that's consistent with that naming, is that both mean the standard deviation of the population, but for STDEVP the sample is the whole population.

Stephen Tashi · Jul 18, 2012

haruspex said:

No, it's not arbitrary.

Yes it is arbitrary. You're assuming "sample standard deviation" must be a different formula than "unbiased estimator for the population standard deviation". The formula for the unbiased estimator is not arbitrary. Books define the "sample variance" and "sample standard deviation" in various ways. My old statistics text by Mood, Graybill and Boes uses a divisor of N-1 for the sample variance. (There have been threads about this on the forum.) The opinions of the programmers who code Excel are not authoritative in this matter.

haruspex · Jul 18, 2012

Stephen Tashi said:

Yes it is arbitrary. You're assuming "sample standard deviation" must be a different formula than "unbiased estimator for the population standard deviation". The formula for the unbiased estimator is not arbitrary. Books define the "sample variance" and "sample standard deviation" in various ways. My old statistics text by Mood, Graybill and Boes uses a divisor of N-1 for the sample variance. (There have been threads about this on the forum.) The opinions of the programmers who code Excel are not authoritative in this matter.

As far as I have been able to ascertain, the root problem is that the term "sample variance" is ambiguous. There are two distinct concepts and a well defined formula for each: the variance of the sample (divide by N) and the unbiased estimator of the population variance (divide by N-1). The first is also also known as the biased sample variance, and the latter as the unbiased sample variance. As a result, the term "sample variance" is sometimes used to refer to one and sometimes to the other. See e.g. http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance. So there's not an arbitrary choice to be made, just an ambiguity in terminology to be negotiated.
I notice that that reference suggests the biased sample variance is a valid estimator for the variance of the population. When the population is very large compared with the sample that is provably false. I would guess that if the sample is known to be, say, half the population, then some compromise formula would be the least biased one.

andrewr · Jul 24, 2012

Stephen Tashi said:

What you ought to say depends on what you mean. "Standard deviation" itself is an ambiguous term.

Three main topics in statistics are distributions, statistics and estimators.

In the study of distributions, a distribution can have certain parameters, such as its standard deviation, which might also be called "the population standard deviation".

Correct, and the purpose of MLE, as I understood a particular NIST reference to it -- was to produce an estimate of the population's standard deviation; not the sample's deviation.

The formal definition of "a statistic" is that it is a function of the values in a sample.

I suppose I agree with that, but I saw a slightly different nuance in what I was taught...
eg:

"a large random sample is selected and the proportion [itex]\hat p[/itex] of people in this sample favoring the brand of coffee in question is calculated. The value [itex]\hat p[/itex] is now used to make an inference concerning the true proportion p.

Now, [itex]\hat p[/itex] is a function of the observed values in the random sample; since many random samples are possible from the same population, we would expect [itex]\hat p[/itex] to vary somewhat from sample to sample. That is, [itex]\hat p[/itex] is a value of a random variable that we represent by [itex]\hat p[/itex]. Such a random variable is called a statistic"
Probability and Statistics for Engineers and Scientists, Fifth Edition, Ronald E. Wadpole, Raymond H. Meyers

Technically it is not a single numerical value (like 0.20) but people commonly use the term "statistic" to refer to a single numerical value of a statistic.

So, what you are saying almost makes sense to me. In the book I learned from, quoted above, it appeared to me that the word statistic was a "value" of a *particular* sample; but
you seem to be equating it to *only* the function (itself) of a random variable? or do you mean, that a function could return -- say -- a vector, and hence it isn't a scalar (a "value" in that sense?).

To me, it seems the definition allowed for a contextual variation in usage.
I know that if I wanted to restrict it to a single value -- that I could say, "point estimate"; but I didn't think the extra words were necessary given what I thought was sufficient context.

In the context of samples, "standard deviation" means "sample standard deviation" or by common mis-use of the term, a particular value of the sample standard deviation.

I think I see; The word, standard deviation, by itself is supposed to be for the population, and is by definition divide by N; the other versions based on samples smaller than the full population are, I think, historical developments after the original formula and it's name was already given.

The definition of "sample standard deviation" is not standardized. Some books define it by a formula that uses a divisor of the sample size N and other use a divisor of N-1. It's an arbitrary choice.

My text defined it as N-1, based on there being a degree of freedom issue involved; But I am uncomfortable with that reasoning, for it would seem to apply even to the full population ! -- I understand N-1 to be a "Bessel correction" to the sample deviation. So I am aware of the non-standardization issue you speak of. I thought MLE would clarify that -- but it appears to have done the opposite ?

Estimators are statistics used to estimate things - usually we estimate the parameters of a distribution.

ibid: statistic.

Since estimators are statistics they are functions. A particular value of an estimator should be called an "estimate" but people mis-use the terminology and say things like "The MLE estimator for the standard deviation is 0.20").

I see, I do believe that particular miss-spelling and confounding is my own mistake and not NISTS,; I really did mean estimate, and if I had carefully proof-read, I might have caught that myself. I ought to have said, something more like: "The MLE's estimate and/or an unbiased estimate, for the standard deviation is 0.20"

But, I think that is awfully wordy.

Estimators are random variables (as are all statistics) since they are functions of the values of random samples. As random variables, they have certain properties such as "maximum liklihood", "unbiased", "minimum variance", "least square error". There are different estimators for the population standard deviation and there is a definite relation between the formulas used and the properties of the estimator, so whether one divides by N,N-1 or N+1 in computing the standard deviation is not arbitrary if you intend the estimator to have certain properties.

Sure, I understand that quite well.

Virtualtux gave an interesting example of the estimators for the population variance in post #12 of the thread https://www.physicsforums.com/showthread.php?t=616643

(The NIST hanbook has words about assuming all bias is removed, but the MLE estimator of the variance is biased.)

I think the MLE is not NECESSARILY unbiased, but it can be unbiased; (qualified). but this point is also a serious detour to my original question, as I don't *produce* this estimate in the first place; I only use it.

I'm not sure I understand Virtualtux's reference to the N-1 being "unbiased" in any event; Bessel correction does not completely remove bias for samples < population. In the limit N->inf, both N and N-1 and even N+1 are going to be unbiased for the error introduced by being off by 1, goes to zero in the limit (re-sampling of population is implied if population is not infinite). Hence, an MLE can be unbiased if the full population is known somehow. I do not know how NIST or ISO arrives at their solution, I only found references -- and as you yourself point out; they do seem to have errors ? right?

Again, though, what would have been clearest for me to pose as the OP question, now that I have given as much background as I know -- and have done my best to understand your patient pedantism.

I am thinking, the only mistake I truly made it to confound estimate with estimator.
Any other mistakes I made were simply things I was told or learned that I couldn't be expected to have judged with my level of knowledge.

haruspex · Jul 24, 2012

andrewr said:

My text defined it as N-1, based on there being a degree of freedom issue involved; But I am uncomfortable with that reasoning, for it would seem to apply even to the full population ! -- I understand N-1 to be a "Bessel correction" to the sample deviation.

The N-1 divisor is derivable from the assumption that the whole population is effectively infinite. I.e. large compared with the sample. As the sample size grows towards the full population size, so the unbiased estimate divisor tends to N, but I'm not aware of any actual formula for that.

I'm not sure I understand Virtualtux's reference to the N-1 being "unbiased"

It is unbiased in the sense that if you divide by N-1 then the expected error (in the variance) is zero. Note that this does not make the square root of it, the s.d., unbiased!
An MLE aims to find the peak in the probability distribution for the parameter, which is clearly a different aim from unbiased. Of course, it could happen to be unbiased.

Hence, an MLE can be unbiased if the full population is known somehow.

Not sure what you're saying there. Do you mean all datapoints known or merely the size of the whole population? If all datapoints, you can calculate the value exactly. Not sure what MLE would mean there.

Stephen Tashi · Jul 24, 2012

Andrewr,

Assuming we have made some progress in definitions, I'll return to the question of "What is the objective of the original post?".

I think one of the objectives is:
Given two independent Gaussian random variables X and Y and given maximum liklihood estimates (i.e. specific numerical values) of their means and standard deviations, but not the sample values or sample sizes from which these estimates are obtained, find the maximum likelihood estimates for the mean and variance of the random variable Z = XY.

I doubt there is not enough given information to solve this problem. So if you are going to solve something, you must make some assumptions. You can assume the population means and standard deviations of X and Y are exactly equal to the numerical values of the given estimates. Then we have the question: What are your requirements for the form of the answer?

One can solve the problem by simulation. One can solve the problem by numerical integration techniques. If these are unsatisfactory then what do you want a solution to look like? Can it be a table of values? Must it be a simple algebraic formula that a person can compute with a calculator? I don't know offhand how to produce an answer in simplified form. My intuition is that getting an approximate answer in simplified form is a reasonable mathematical daydream, but it's not a "well known" result (to me!). It sounds like a topic for a graduate level thesis and I suppose there are rare examples of people solving such problems on internet forums.

One of the other objectives in the original post is to solve a similar problem for the quotient X/Y of independent Gaussian random variables and we have agreed that mean and standard deviation of the distribution of X/Y may not exist. However, you feel it is still possible to make progress - which necessarily means that we must solve some different problem. I gather that you think we might modify the problem by "leaving out" the event that Y is exactly zero. My intuition is that this won't work. I think you'd have to leave out a finite interval around the value Y = 0.

As I mentioned before, if you want to solve the problems in a way that relates to actual physical measurements, you shouldn't use Gaussian distributions.

andrewr · Jul 24, 2012

haruspex said:

It is unbiased in the sense that if you divide by N-1 then the expected error (in the variance) is zero. Note that this does not make the square root of it, the s.d., unbiased!

Oh, oops -- yes, he did say variance. I was thinking Bessel correction -- which is for s.d.

Yes!
The mean of the square root of many point estimates of the variance, is biased; and the square root of the mean of many point estimates of the variance is also biased...

That's where a second correction needs to be made to un-bias an estimate of s.d.

I was thinking about the OP, and didn't notice that VirtualTux's comment didn't directly relate to s.d. ! :redface:

...

An MLE aims to find the peak in the probability distribution for the parameter, which is clearly a different aim from unbiased. Of course, it could happen to be unbiased.

Not sure what you're saying there. Do you mean all datapoints known or merely the size of the whole population? If all datapoints, you can calculate the value exactly. Not sure what MLE would mean there.

Detailed treatement of MLE as a method, was beyond the scope of the text I learned from.
One has to define some formula based on the population's distribution and some arbitrary sample which is to be maximized. In the case of a normal distribution, and some kind of undefined sampling from it -- I can't be certain of what constitutes the "probability" of a sample; (Chiro seems to have noticed this issue as well.) Samples may be of varying size and variable techniques in being obtained. Since NIST does not define these things, I am simply inferring from their statements that I can't use the knee-jerk response of assuming the simple formula given in my book for the S.D. MLE of a Gaussian is the only one possible.

eg: see a NIST intro to MLE; the same I was trying to indicate earlier, but as a direct link ...

http://www.itl.nist.gov/div898/handbook/eda/section3/eda3652.htm

NOTE, this is only the intro, BUT:
They do not say the estimator for small samples *is* biased, but only "can" be.
They also say that under a certain set of circumstances MLE estimates are *unbiased*.

I have to make assumptions as to why they say this, or I have to dig deeper into their site...
In reading through their archives more carefully, the subject of MLE becomes far too complex for me to follow -- let alone describe here in any detail. All I do know is that it is *somehow* used to arrive at the estimate for S.D. given by NIST when reporting data, and I infer that the simple formulas I learned for Gaussian MLE's are not the only possible ones.

Here is example data, directly from NIST, which I was referring to in the OP.
Iron, Atomic weight and ... composition

As you can see, it doesn't come with any information about population size, exact shape of distribution, etc. But, this is the best data there *is* for what I want to do.

I plan on using this data (and more) in various equations in chemistry and physics; I might use it to compute mass transfer, or electroplating mass or density, or ... ? And then, I would report a result of my calculation for Engineering purposes.

I myself, then, am not measuring anything; I am merely constructing a mathematical model that estimates what a measurement of an actual experiment would be. The present status quo, is to simply assume each calculation step results in a Gaussian, perhaps keeping track of any correlation introduced by mathematical operators, and then ignore the errors introduced by these imperfect assumptions; often these introduced errors are statistically insignificant anyway. So, in a sense, I'm just a glutton for punishment wanting a slightly better result... (precisely -- more robust).

andrewr · Jul 24, 2012

Stephen Tashi said:

Andrewr,

Assuming we have made some progress in definitions, I'll return to the question of "What is the objective of the original post?".

I think one of the objectives is:
Given two independent Gaussian random variables X and Y and given maximum liklihood estimates (i.e. specific numerical values) of their means and standard deviations, but not the sample values or sample sizes from which these estimates are obtained, find the maximum likelihood estimates for the mean and variance of the random variable Z = XY.

Yes, that look right. Although Z might be arbitrarily replaced with any number of operations and any number of Gaussian inputs.

I doubt there is not enough given information to solve this problem. So if you are going to solve something, you must make some assumptions. You can assume the population means and standard deviations of X and Y are exactly equal to the numerical values of the given estimates.

Yes, there is no choice but to make assumptions. I am simply assuming (for calculations) that the distribution is actually a Gaussian, and the mu and sigma when used as if they were *exact* values will introduce insignificant errors -- although I have no proof of this.

Then we have the question: What are your requirements for the form of the answer?
One can solve the problem by simulation. One can solve the problem by numerical integration techniques. If these are unsatisfactory then what do you want a solution to look like? Can it be a table of values? Must it be a simple algebraic formula that a person can compute with a calculator?

NIST often reports data to 6 or more significant figures. Simulation, which I am using to actually inspect what happens during mathematical operators (see graphs in earlier posts),
is impractical for that many digits of precision. But, simulation would be my preference if it were not excessively time and CPU intensive. The major reason I am doing analytical studies is because simulation quickly becomes intractable.

Numerical integrations (quadratures?) is something I haven't thought about; Your idea, however, is an excellent direction to explore...

Time wise, and memory /program footprint -wise, I am interested in building a *simple* calculator. SO, yes -- it needs to be possible to compute on a calculator with program-ability.

I don't know offhand how to produce an answer in simplified form. My intuition is that getting an approximate answer in simplified form is a reasonable mathematical daydream, but it's not a "well known" result (to me!). It sounds like a topic for a graduate level thesis and I suppose there are rare examples of people solving such problems on internet forums.

I am willing to put the time in; and study the issue carefully -- even if it takes me months.
I will, of course, not try to pester every-one with every niggling detail... Nor do this so quickly, that I use up your precious time!

However, I am pretty certain with the results I have already derived, that it isn't entirely impractical to do -- just very difficult to describe correctly.

One of the other objectives in the original post is to solve a similar problem for the quotient X/Y of independent Gaussian random variables and we have agreed that mean and standard deviation of the distribution of X/Y may not exist. However, you feel it is still possible to make progress - which necessarily means that we must solve some different problem. I gather that you think we might modify the problem by "leaving out" the event that Y is exactly zero. My intuition is that this won't work. I think you'd have to leave out a finite interval around the value Y = 0.

Hmm, well -- in the plot I showed, where zero was included -- the numerical simulation never hit zero in over a million trials. ( I doubt NIST even attempts that many actual measurements to produce a single result! ) My Graph still had a well defined shape and although one might question the precision of the sigma value, it has a very real robust virtue: It is so large as to be easily identified by eye as an unreliable result. This happens to be very uniformly true in problems where zero becomes a realistic value in the division...

If the calculation (which is designed/chosen by an Engineer/Chemist) is intrinsically un-reliable, it becomes NOTICABLE.

OTOH: There are many *practical* division problems with Gaussians (esp; where |mu| > sigma > 0 ), that are going to produce *very* reliable results (small sigma). Both of these cases are a very useful thing to an Engineer/Chemist since KNOWING how badly a particular division affects the sigma gives them some additional (and intrinsic) cross checking of their model.

As I mentioned before, if you want to solve the problems in a way that relates to actual physical measurements, you shouldn't use Gaussian distributions.

That's laudable, but I don't really see that I have a choice in the real world.
The calculations are going to be done with the assumption of Gaussians, the question is merely -- how often will inaccuracies introduced by the calculations give statistically false positive results?

Let me try and give a little perspective;
When doing an error propagation analysis for something which contains multiplication OR division, esp at the undergraduate level, and even much of the masters level, there is a cookbook approach used by a large portion of problem solvers:

http://en.wikipedia.org/wiki/Propagation_of_uncertainty

(! And NOTE: Viraltux has been correcting this page, which is GOOD ! )

Now, I believe these formulas are *VERY* mainstream, and show up even on mathematical oriented sites.

BUT: Notice, that for both multiplication and division, the formula is using a ratio of the error (std dev), to the mean. For multiplication, this is fine -- for, as I derived elsewhere -- the product of the means happens to be the mean of the products for independent variables.
(And I ought to have just remembered that from basic stats...but oh well, I proved what I ought to have known...)

However, when I try that for division -- by simulation -- (see graphs above!) -- the same can't be said! In practical terms, this ratio approach causes major defects in the robustness of the approximations:

eg: If I were to, for example, want to multiply a value of 0(1), times 17(0), I know the result is very predictable by numerical simulation -- (It's a true Gaussian EVEN!) but the mainstream multiplication approximation will fail, because computing the result requires a division by a zero...

And, Division approximations are *apparently* even worse...
(I have simulations, not analytical expressions so I don't wish to over-state the case.)

I'll spend a little time thinking about your idea of numerical integration, and re-read the entire thread -- perhaps I can re-focus, and state the problem with some more clarification.
Please, do ask questions, those are very helpful.

Stephen Tashi · Jul 24, 2012

andrewr said:

Hmm, well -- in the plot I showed, where zero was included -- the numerical simulation never hit zero in over a million trials. ( I doubt NIST even attempts that many actual measurements to produce a single result! ) My Graph still had a well defined shape and although one might question the precision of the sigma value, it has a very real robust virtue: It is so large as to be easily identified by eye as an unreliable result. This happens to be very uniformly true in problems where zero becomes a realistic value in the division...

That's laudable, but I don't really see that I have a choice in the real world.

I think what you mean is that you perceive a bureacratic requirement to say that you are using Gaussians. However, you have no choice except not to use Gaussians if you rely on simulations. The method of representing numbers used by a computer limits their magnitude and it also makes all probability distributions discrete insetad of continuous. So you have truncated the Gaussian whether you wanted to or not. ( In reality, it is impossible to sample from any conitnuous distribution, even the uniform distribution on [0,1], either by use of a computer or by use of a physical apparatus since the measurements from a physical apparatus have finite precision. )

We should get the facts about the ratio of two Gaussians straight. The Wikipedia has an interesting article which reveals some of them. http://en.wikipedia.org/wiki/Ratio_distribution

In the case of the ratio of two independent Gaussians X and Y , each with mean zero, the distribution of X/Y is a Cauchy distribuiton whose mean and variance do not exist. http://en.wikipedia.org/wiki/Cauchy_distribution

There may be some hope in the case where mean of Y is not zero, but I haven't found any overt statement of the facts. The paper by Marsaglia "Ratios Of Normal Variables" (PDF: http://www.google.com/url?sa=t&rct=...sg=AFQjCNEgO1dvktreWiL-rt-ZPcS3K1FmYQ&cad=rja) may contain the answer if you can decipher what happens in the special case when the two "jointly normal" variables he deals with are independent. In his general case, the mean and variance apparently don't exist. However, he has suggestions for practical applications that involve assuming bounds on the denominator (i.e. truncating it !).

I might eventually get around to reading that paper carefully, but it wouldn't disappoint me at all if you or haruspex do that first!

haruspex · Jul 26, 2012

haruspex said:

The N-1 divisor is derivable from the assumption that the whole population is effectively infinite. I.e. large compared with the sample. As the sample size grows towards the full population size, so the unbiased estimate divisor tends to N, but I'm not aware of any actual formula for that.

Fwiw, I've derived the general formula for the unbiased estimator of the variance of a population size P based on a sample size N. If the variance of a sample size N in isolation (i.e. mean square of differences from sample mean) is [itex]\hat{\sigma}^2[/itex] then the unbiased estimate of the variance of the whole population of P is [itex]\hat{\sigma}^2\frac{(NP-N)}{(NP-P)}[/itex]. Note that both at P = N and as P tends to infinity this gives the two standard results.

Stephen Tashi · Jul 26, 2012

While we're digressing, this thread dealt with sampling from a finite population without replacment: https://www.physicsforums.com/showthread.php?t=475077

andrewr · Jul 29, 2012

Stephen Tashi said:

I think what you mean is that you perceive a bureacratic requirement to say that you are using Gaussians. However, you have no choice except not to use Gaussians if you rely on simulations. The method of representing numbers used by a computer limits their magnitude and it also makes all probability distributions discrete insetad of continuous.

A bit of your philosophy?

There are two+ entangled objections in your comment.
The magnitude of the number, and the discreet nature are really different...
I presently have an arbitrary word length computer algorithm, since I can have 512 bytes / number -- that is effectively enough dynamic range to represent every single atom in the universe -- http://www.universetoday.com/36302/atoms-in-the-universe/

--- "The number of atoms in the entire observable universe is estimated to be within the range of 10**78 to 10**82. ";

Roughly 4 bits is a digit, 82 digits is then 328 bits. And I am at 4096 bits. (But could go larger...)

Precision, wise, then -- I don't think realistic problems are affected by computer precision much... I would even hold doubts that simulations based on 128 bit numbers would affect the issues we are talking about significantly. ( A discussion for another thread, which is coming up...)

So you have truncated the Gaussian whether you wanted to or not. ( In reality, it is impossible to sample from any conitnuous distribution, even the uniform distribution on [0,1], either by use of a computer or by use of a physical apparatus since the measurements from a physical apparatus have finite precision. )

I agree... but just to wave my flag of -- yeah, so what...

That depends on what continuous means... Quantum mechanically, precision is smeared...
So what is Gaussian, anyway?

We should get the facts about the ratio of two Gaussians straight. The Wikipedia has an interesting article which reveals some of them. http://en.wikipedia.org/wiki/Ratio_distribution

In the case of the ratio of two independent Gaussians X and Y , each with mean zero, the distribution of X/Y is a Cauchy distribuiton whose mean and variance do not exist. http://en.wikipedia.org/wiki/Cauchy_distribution

Yes, and that's the Cauchy 0/0. Which I mentioned before as being a serious problem. But, Considering even that distribution *is* symmetrical, it's hard to see why anyone bothers to claim the mean *isn't* zero ? Does, "Does not exist", also sometimes mean zero?

There may be some hope in the case where mean of Y is not zero, but I haven't found any overt statement of the facts. The paper by Marsaglia "Ratios Of Normal Variables"

Stephen, I don't know if you're a beer drinker -- but I'd like to send you a virtual one for that paper. Excellent choice! You can find in a matter of minutes what would take me days...

I might eventually get around to reading that paper carefully, but it wouldn't disappoint me at all if you or haruspex do that first!

I read it, several times, spent some time meditating on it -- did a WHOLE lot of calculus; and am going to ask for a review and discussion. (I am also attempting Chiro's suggestion of the Forier, but I did it another way for discussion. )

Setting the strict Cauchy aside, which I think (in the limit at least) does have a mean -- the other Gaussian ratios do have a mean. I derived a formula, and verified it as much as possible using simulations (but with a catch... simulations do fail with a predictable frequency)

Given two gaussians N(a,1) and N(b,1), the result of N(a,1) / N(b,1) gives the following result:
[tex] \mu = a \sqrt { \pi \over 2 } e ^{ - { b ^{ 2 } \over 2 } } \times erfi \left ( { b \over \sqrt { 2 } } \right )[/tex]
where
[tex] erfi(t) = \sqrt { 4 \over \pi } \int \limits _{ 0 } ^{ t } e ^{ t ^ { 2 } }[/tex]
Alternately,
[tex] \mu = a \sqrt { 2 } \int \limits _{ 0 } ^{ b } e ^{ t ^ { 2 } - b ^{ 2 } \over 2 } dt[/tex]

I'd like to actually discuss this issue in some more detail in another thread, as it will quickly overwhelm the question I am trying to answer here...
But you are right, Stephen, we need to really get our definitions straight to even hold the discussion...

I will provide approximations which are more tractable, but these are the exact results.
I would like to approach the idea of a "quasi" standard deviation in that thread as well.
I didn't find it necessary to truncate the Gaussians in order to compute the mean, but that may well not be possible with the standard deviations.

I'll have a new thread by tomorrow on the subject, and fill it out in some detail.

Thanks for the help, so far.

andrewr · Jul 30, 2012

Here's the new thread:
Discuss ratios of Gaussians here...

Oh, , I also made a typo in the previous post that I can't edit. The alternate form of the equation has the wrong limit; That ought to be b/√2, not just b.

Stephen Tashi · Jul 30, 2012

The link to the new thread that works for me: https://www.physicsforums.com/showthread.php?p=4016247

Improving propagation of error in successive calculations

Attachments

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect