Estimating upper bound from measurements with uncertainties

TheBigH
Messages
4
Reaction score
0
Hello everyone,

I have a large number of measurements with associated uncertainties, and I know that the real values are bounded above by some constant. How can I estimate the value of that constant, and the uncertainty on the estimate?

Thanks
 
Physics news on Phys.org
You'll get better advice if you state the problem precisely - and you'll get better advice in the mathematical section if you state what you mean by "uncertainty".

In physical measurements, one scenario is to assume each measurement is an independent measurement of the same value v_0 and that the result of the measurement is v0 + X where X is a gaussian random variable with mean 0 and standard deviation \sigma. The value of \sigma is often called the "uncertainty" of the measurement. In some scenarios, measurements are made with equipment where the "uncertainty" is given by the manufacturer of the measuring device or a calibration lab. In other scenarios, the exact value of \sigma is not known and what you have is only the standard deviation estimated from a specific set of data (i.e. a "sample standard deviation" rather than a "population standard deviation").

There can also be scenarios where "uncertainty" is specified by a rigid bound that is not interpreted as a standard deviation.
 
Okay, assume I have a number of different data points with true values v_1, v_2, .... What I actually measure is v_1+X_1,v_2+X_2,..., where the X are gaussian random variables with mean 0 and SD of \sigma_1, \sigma_2, ..., just like youn say. It is known that there is some v_{max} such that all v_i<v_{max}. Many of the v can be significantly less than that, and some of my measurements might actually be greater than v_{max} because of random chance. I would like to know how I can estimate v_{max} and get confidence intervals on that estimate. Thanks!
 
So you are asking is how to estimate an upper bound on some function of several variables if the error of each variable is known? There is an engineer's rule of thumb that "if variables add, then errors add, if errors multiply, then the relative errors add".

That can be shown by thinking of the errors as differentials: x+ dx. If f= x+ y+ z, then df= dx+ dy+ dz. If f= xyz, then df= yzdx+ xzdy+ xydz. Since f= xyz, df/f= (yzdx+ xzdy+ xydz)/xyz= dx/x+ dy/y+ dz/z.

More generally, if f is a function of x, y, and z, then df= \frac{\partial f}{\partial x}dx+ \frac{\partial f}{\partial y}dy+ \frac{\partial f}{\partial z} dz
 
Is this a problem where you are trying to estimate both which i has the maximum v_i as well as the value of v_i? For example, if you had the measurements of distances to, say, cities, and you wanted to travel to the city that was farthest away, you would want to estimate both the distance to the city and which city was farthest.

Being a Bayesian, I think it would be nice if you could imagine that the v_i are selected from some probability distribution and then that the measurements v_i + X_i are performed. Does a model like that makes sense for the problem?
 
TheBigH said:
Okay, assume I have a number of different data points with true values v_1, v_2, .... What I actually measure is v_1+X_1,v_2+X_2,..., where the X are gaussian random variables with mean 0 and SD of \sigma_1, \sigma_2, ..., just like youn say. It is known that there is some v_{max} such that all v_i<v_{max}. Many of the v can be significantly less than that, and some of my measurements might actually be greater than v_{max} because of random chance. I would like to know how I can estimate v_{max} and get confidence intervals on that estimate. Thanks!


It seems to me that you need to have some idea of what the values of those sigmas are. I also want to know how much of those sigmas is natural variation and how much is measurement error. Without some idea about these things I don't see how to proceed.

It is also worth noting that if you have an upper bound then you don't have a Gaussian. Maybe it is some sort of truncated Gaussian.
 
As I understand TheBigH's statement, the \sigma's are known and the X_i are the only source of variation.
 
Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.
 
That's a reasonably clear example, but we need to examine the statement:

TheBigH said:
I'm trying to estimate the maximum possible height a tree can reach

1. One problem is to estimate the maximum height of a tree in the particular forest where you measured all the trees.

2. A different problem imagines that your measured forest is just one example from the population of all possible forests and you want to know the maximum height of a tree that can occur in the population of all those forests.

3. A different problem imagines that your measured forrest is one example of a particular forest at a particular time and you want to know the maximum height a tree can reach as the forest changes over a long span of time.
 
  • #10
TheBigH said:
Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.

Aha. So the SD should be about the same for each tree. That makes it a lot easier. You seem to have the idea that you measure each tree a number of times and get a sample SD that way. You can do that, but I might assume that the SD of the measurements was always the same and average the sample variances (NOT the SDs, that's invalid).

I'm not sure what to do next. If you have one outlying maximum point then you'd add 1.96SD to that. If you have a large number of points that are close to the maximum then maybe you'll have to assume some things about the underlying distribution. It matters how the distribution cuts off. The only case where a measurement could be higher than the bound is if the distribution truncates sharply so you have a large number of points near the maximum. You also could have this problem if the measurement errors are large. If you don't want to make assumptions then I don't see how to do it. There might be some sort of parametric statistic that will help.
 
Last edited:
  • #11
I don't know how to solve the problem and it looks like a solution would be the material for a graduate thesis (perhaps one that's already written). However, finding wild ideas to try is no problem.

The way to deal with N things, is usually to first try 2 things. So let's assume we have two random variables
Y_1 = v_1 + X_1
Y_2 = v_2 + X_2
Where v_1, v_2 are the unknown actual height of two trees.
X1,X2 are independent zero mean normally distirbuted random variables with respective standard deviations \sigma_1, \sigma_2

Thinking like a Bayesian, my intuition is that if the prior distribution for v_1 is taken to be a uniform distribution over a very large interval, then the posterior distribution for the location of v_1 given the measurement Y_1 = y_1 is approximately a normal distribution with mean y_1 and standard deviation \sigma_1. Similarly, the posterior distribution for v_2 given Y_2 = y_2 is approximately a normal distribution with mean v_2 and standard deviation \sigma_2

If we can find the posterior distribution of V_{max} = max(v_1,v_2) then we can estimate V_{max} using maximum likelihood or some other standard method.
 
  • #12
In the previous post, I should have said posterior distributions have means y_1 and y_2 instead of saying they were the unknowns v_1 and v_2.

max(v_1,v_2) ( given Y1=y1 and Y_2=y_2) is the second "order statistic" of two independent, but not identically distributed random variables. So we can look up the theory on it.

The generalization to max(v_1,v_2,...v_n) , which is the n_th order statistic of a set of independent random variables is given by the Bapat-Beg theorem. http://en.wikipedia.org/wiki/Bapat–Beg_theorem

To make progress on the practical problem, what is needed is a way to approximate the result of the Bapat-Beg theorem or at least a way to approximate the mean and variance of the n_th order statistic of a set of indpendent, not identically distributed, normal random variables.
 
Last edited:

Similar threads

Replies
2
Views
2K
Replies
3
Views
2K
Replies
4
Views
1K
Replies
7
Views
2K
Replies
1
Views
2K
Replies
5
Views
2K
Replies
5
Views
2K
Replies
17
Views
2K
Back
Top