Estimating upper bound from measurements with uncertainties

TheBigH · Jan 21, 2014

Hello everyone,

I have a large number of measurements with associated uncertainties, and I know that the real values are bounded above by some constant. How can I estimate the value of that constant, and the uncertainty on the estimate?

Thanks

Stephen Tashi · Jan 23, 2014

You'll get better advice if you state the problem precisely - and you'll get better advice in the mathematical section if you state what you mean by "uncertainty".

In physical measurements, one scenario is to assume each measurement is an independent measurement of the same value [itex]v_0[/itex] and that the result of the measurement is [itex]v0 + X[/itex] where X is a gaussian random variable with mean 0 and standard deviation [itex]\sigma[/itex]. The value of [itex]\sigma[/itex] is often called the "uncertainty" of the measurement. In some scenarios, measurements are made with equipment where the "uncertainty" is given by the manufacturer of the measuring device or a calibration lab. In other scenarios, the exact value of [itex]\sigma[/itex] is not known and what you have is only the standard deviation estimated from a specific set of data (i.e. a "sample standard deviation" rather than a "population standard deviation").

There can also be scenarios where "uncertainty" is specified by a rigid bound that is not interpreted as a standard deviation.

TheBigH · Jan 23, 2014

Okay, assume I have a number of different data points with true values [itex]v_1, v_2, ...[/itex]. What I actually measure is [itex]v_1+X_1,v_2+X_2,...[/itex], where the X are gaussian random variables with mean 0 and SD of [itex]\sigma_1, \sigma_2, ...[/itex], just like youn say. It is known that there is some [itex]v_{max}[/itex] such that all [itex]v_i<v_{max}[/itex]. Many of the [itex]v[/itex] can be significantly less than that, and some of my measurements might actually be greater than [itex]v_{max}[/itex] because of random chance. I would like to know how I can estimate [itex]v_{max}[/itex] and get confidence intervals on that estimate. Thanks!

HallsofIvy · Jan 23, 2014

So you are asking is how to estimate an upper bound on some function of several variables if the error of each variable is known? There is an engineer's rule of thumb that "if variables add, then errors add, if errors multiply, then the relative errors add".

That can be shown by thinking of the errors as differentials: x+ dx. If f= x+ y+ z, then df= dx+ dy+ dz. If f= xyz, then df= yzdx+ xzdy+ xydz. Since f= xyz, df/f= (yzdx+ xzdy+ xydz)/xyz= dx/x+ dy/y+ dz/z.

More generally, if f is a function of x, y, and z, then [itex]df= \frac{\partial f}{\partial x}dx+ \frac{\partial f}{\partial y}dy+ \frac{\partial f}{\partial z} dz[/itex]

Stephen Tashi · Jan 24, 2014

Is this a problem where you are trying to estimate both which [itex]i[/itex] has the maximum [itex]v_i[/itex] as well as the value of [itex]v_i[/itex]? For example, if you had the measurements of distances to, say, cities, and you wanted to travel to the city that was farthest away, you would want to estimate both the distance to the city and which city was farthest.

Being a Bayesian, I think it would be nice if you could imagine that the [itex]v_i[/itex] are selected from some probability distribution and then that the measurements [itex]v_i + X_i[/itex] are performed. Does a model like that makes sense for the problem?

Hornbein · Jan 24, 2014

TheBigH said:

Okay, assume I have a number of different data points with true values [itex]v_1, v_2, ...[/itex]. What I actually measure is [itex]v_1+X_1,v_2+X_2,...[/itex], where the X are gaussian random variables with mean 0 and SD of [itex]\sigma_1, \sigma_2, ...[/itex], just like youn say. It is known that there is some [itex]v_{max}[/itex] such that all [itex]v_i<v_{max}[/itex]. Many of the [itex]v[/itex] can be significantly less than that, and some of my measurements might actually be greater than [itex]v_{max}[/itex] because of random chance. I would like to know how I can estimate [itex]v_{max}[/itex] and get confidence intervals on that estimate. Thanks!

It seems to me that you need to have some idea of what the values of those sigmas are. I also want to know how much of those sigmas is natural variation and how much is measurement error. Without some idea about these things I don't see how to proceed.

It is also worth noting that if you have an upper bound then you don't have a Gaussian. Maybe it is some sort of truncated Gaussian.

Stephen Tashi · Jan 24, 2014

As I understand TheBigH's statement, the [itex]\sigma[/itex]'s are known and the [itex]X_i[/itex] are the only source of variation.

TheBigH · Jan 26, 2014

Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.

Stephen Tashi · Jan 26, 2014

That's a reasonably clear example, but we need to examine the statement:

TheBigH said:

I'm trying to estimate the maximum possible height a tree can reach

1. One problem is to estimate the maximum height of a tree in the particular forest where you measured all the trees.

2. A different problem imagines that your measured forest is just one example from the population of all possible forests and you want to know the maximum height of a tree that can occur in the population of all those forests.

3. A different problem imagines that your measured forrest is one example of a particular forest at a particular time and you want to know the maximum height a tree can reach as the forest changes over a long span of time.

Hornbein · Jan 27, 2014

TheBigH said:

Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.

Aha. So the SD should be about the same for each tree. That makes it a lot easier. You seem to have the idea that you measure each tree a number of times and get a sample SD that way. You can do that, but I might assume that the SD of the measurements was always the same and average the sample variances (NOT the SDs, that's invalid).

I'm not sure what to do next. If you have one outlying maximum point then you'd add 1.96SD to that. If you have a large number of points that are close to the maximum then maybe you'll have to assume some things about the underlying distribution. It matters how the distribution cuts off. The only case where a measurement could be higher than the bound is if the distribution truncates sharply so you have a large number of points near the maximum. You also could have this problem if the measurement errors are large. If you don't want to make assumptions then I don't see how to do it. There might be some sort of parametric statistic that will help.

Stephen Tashi · Jan 27, 2014

I don't know how to solve the problem and it looks like a solution would be the material for a graduate thesis (perhaps one that's already written). However, finding wild ideas to try is no problem.

The way to deal with N things, is usually to first try 2 things. So let's assume we have two random variables
[itex]Y_1 = v_1 + X_1[/itex]
[itex]Y_2 = v_2 + X_2[/itex]
Where [itex]v_1, v_2[/itex] are the unknown actual height of two trees.
[itex]X1,X2[/itex] are independent zero mean normally distirbuted random variables with respective standard deviations [itex]\sigma_1, \sigma_2[/itex]

Thinking like a Bayesian, my intuition is that if the prior distribution for [itex]v_1[/itex] is taken to be a uniform distribution over a very large interval, then the posterior distribution for the location of [itex]v_1[/itex] given the measurement [itex]Y_1 = y_1[/itex] is approximately a normal distribution with mean [itex]y_1[/itex] and standard deviation [itex]\sigma_1[/itex]. Similarly, the posterior distribution for [itex]v_2[/itex] given [itex]Y_2 = y_2[/itex] is approximately a normal distribution with mean [itex]v_2[/itex] and standard deviation [itex]\sigma_2[/itex]

If we can find the posterior distribution of [itex]V_{max} = max(v_1,v_2)[/itex] then we can estimate [itex]V_{max}[/itex] using maximum likelihood or some other standard method.

Stephen Tashi · Jan 28, 2014

In the previous post, I should have said posterior distributions have means [itex]y_1[/itex] and [itex]y_2[/itex] instead of saying they were the unknowns [itex]v_1[/itex] and [itex]v_2[/itex].

[itex]max(v_1,v_2)[/itex] ( given [itex]Y1=y1[/itex] and [itex]Y_2=y_2[/itex]) is the second "order statistic" of two independent, but not identically distributed random variables. So we can look up the theory on it.

The generalization to [itex]max(v_1,v_2,...v_n)[/itex] , which is the n_th order statistic of a set of independent random variables is given by the Bapat-Beg theorem. http://en.wikipedia.org/wiki/Bapat–Beg_theorem

To make progress on the practical problem, what is needed is a way to approximate the result of the Bapat-Beg theorem or at least a way to approximate the mean and variance of the n_th order statistic of a set of indpendent, not identically distributed, normal random variables.

Estimating upper bound from measurements with uncertainties

Graduate Expected numbers of cards of a last color remaining

Graduate Probability puzzle

Undergrad The problem of points

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Estimating upper bound from measurements with uncertainties

Similar threads