# Estimating upper bound from measurements with uncertainties

1. Jan 21, 2014

### TheBigH

Hello everyone,

I have a large number of measurements with associated uncertainties, and I know that the real values are bounded above by some constant. How can I estimate the value of that constant, and the uncertainty on the estimate?

Thanks

2. Jan 23, 2014

### Stephen Tashi

You'll get better advice if you state the problem precisely - and you'll get better advice in the mathematical section if you state what you mean by "uncertainty".

In physical measurements, one scenario is to assume each measurement is an independent measurement of the same value $v_0$ and that the result of the measurement is $v0 + X$ where X is a gaussian random variable with mean 0 and standard deviation $\sigma$. The value of $\sigma$ is often called the "uncertainty" of the measurement. In some scenarios, measurements are made with equipment where the "uncertainty" is given by the manufacturer of the measuring device or a calibration lab. In other scenarios, the exact value of $\sigma$ is not known and what you have is only the standard deviation estimated from a specific set of data (i.e. a "sample standard deviation" rather than a "population standard deviation").

There can also be scenarios where "uncertainty" is specified by a rigid bound that is not interpreted as a standard deviation.

3. Jan 23, 2014

### TheBigH

Okay, assume I have a number of different data points with true values $v_1, v_2, ...$. What I actually measure is $v_1+X_1,v_2+X_2,...$, where the X are gaussian random variables with mean 0 and SD of $\sigma_1, \sigma_2, ...$, just like youn say. It is known that there is some $v_{max}$ such that all $v_i<v_{max}$. Many of the $v$ can be significantly less than that, and some of my measurements might actually be greater than $v_{max}$ because of random chance. I would like to know how I can estimate $v_{max}$ and get confidence intervals on that estimate. Thanks!

4. Jan 23, 2014

### HallsofIvy

Staff Emeritus
So you are asking is how to estimate an upper bound on some function of several variables if the error of each variable is known? There is an engineer's rule of thumb that "if variables add, then errors add, if errors multiply, then the relative errors add".

That can be shown by thinking of the errors as differentials: x+ dx. If f= x+ y+ z, then df= dx+ dy+ dz. If f= xyz, then df= yzdx+ xzdy+ xydz. Since f= xyz, df/f= (yzdx+ xzdy+ xydz)/xyz= dx/x+ dy/y+ dz/z.

More generally, if f is a function of x, y, and z, then $df= \frac{\partial f}{\partial x}dx+ \frac{\partial f}{\partial y}dy+ \frac{\partial f}{\partial z} dz$

5. Jan 24, 2014

### Stephen Tashi

Is this a problem where you are trying to estimate both which $i$ has the maximum $v_i$ as well as the value of $v_i$? For example, if you had the measurements of distances to, say, cities, and you wanted to travel to the city that was farthest away, you would want to estimate both the distance to the city and which city was farthest.

Being a Bayesian, I think it would be nice if you could imagine that the $v_i$ are selected from some probability distribution and then that the measurements $v_i + X_i$ are performed. Does a model like that makes sense for the problem?

6. Jan 24, 2014

### Hornbein

It seems to me that you need to have some idea of what the values of those sigmas are. I also want to know how much of those sigmas is natural variation and how much is measurement error. Without some idea about these things I don't see how to proceed.

It is also worth noting that if you have an upper bound then you don't have a Gaussian. Maybe it is some sort of truncated Gaussian.

7. Jan 24, 2014

### Stephen Tashi

As I understand TheBigH's statement, the $\sigma$'s are known and the $X_i$ are the only source of variation.

8. Jan 26, 2014

### TheBigH

Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.

9. Jan 26, 2014

### Stephen Tashi

That's a reasonably clear example, but we need to examine the statement:

1. One problem is to estimate the maximum height of a tree in the particular forest where you measured all the trees.

2. A different problem imagines that your measured forest is just one example from the population of all possible forests and you want to know the maximum height of a tree that can occur in the population of all those forests.

3. A different problem imagines that your measured forrest is one example of a particular forest at a particular time and you want to know the maximum height a tree can reach as the forest changes over a long span of time.

10. Jan 27, 2014

### Hornbein

Aha. So the SD should be about the same for each tree. That makes it a lot easier. You seem to have the idea that you measure each tree a number of times and get a sample SD that way. You can do that, but I might assume that the SD of the measurements was always the same and average the sample variances (NOT the SDs, that's invalid).

I'm not sure what to do next. If you have one outlying maximum point then you'd add 1.96SD to that. If you have a large number of points that are close to the maximum then maybe you'll have to assume some things about the underlying distribution. It matters how the distribution cuts off. The only case where a measurement could be higher than the bound is if the distribution truncates sharply so you have a large number of points near the maximum. You also could have this problem if the measurement errors are large. If you don't want to make assumptions then I don't see how to do it. There might be some sort of parametric statistic that will help.

Last edited: Jan 27, 2014
11. Jan 27, 2014

### Stephen Tashi

I don't know how to solve the problem and it looks like a solution would be the material for a graduate thesis (perhaps one that's already written). However, finding wild ideas to try is no problem.

The way to deal with N things, is usually to first try 2 things. So let's assume we have two random variables
$Y_1 = v_1 + X_1$
$Y_2 = v_2 + X_2$
Where $v_1, v_2$ are the unknown actual height of two trees.
$X1,X2$ are independent zero mean normally distirbuted random variables with respective standard deviations $\sigma_1, \sigma_2$

Thinking like a Bayesian, my intuition is that if the prior distribution for $v_1$ is taken to be a uniform distribution over a very large interval, then the posterior distribution for the location of $v_1$ given the measurement $Y_1 = y_1$ is approximately a normal distribution with mean $y_1$ and standard deviation $\sigma_1$. Similarly, the posterior distribution for $v_2$ given $Y_2 = y_2$ is approximately a normal distribution with mean $v_2$ and standard deviation $\sigma_2$

If we can find the posterior distribution of $V_{max} = max(v_1,v_2)$ then we can estimate $V_{max}$ using maximum likelihood or some other standard method.

12. Jan 28, 2014

### Stephen Tashi

In the previous post, I should have said posterior distributions have means $y_1$ and $y_2$ instead of saying they were the unknowns $v_1$ and $v_2$.

$max(v_1,v_2)$ ( given $Y1=y1$ and $Y_2=y_2$) is the second "order statistic" of two independent, but not identically distributed random variables. So we can look up the theory on it.

The generalization to $max(v_1,v_2,....v_n)$ , which is the n_th order statistic of a set of independent random variables is given by the Bapat-Beg theorem. http://en.wikipedia.org/wiki/Bapat–Beg_theorem

To make progress on the practical problem, what is needed is a way to approximate the result of the Bapat-Beg theorem or at least a way to approximate the mean and variance of the n_th order statistic of a set of indpendent, not identically distributed, normal random variables.

Last edited: Jan 28, 2014