Estimating upper bound from measurements with uncertainties

In summary, the conversation discusses the problem of estimating the maximum possible value of a set of measurements with associated uncertainties. It explores the concept of "uncertainty" and how it can be interpreted in different scenarios. The discussion also touches on the use of probability distributions and models in solving the problem.
  • #1
TheBigH
4
0
Hello everyone,

I have a large number of measurements with associated uncertainties, and I know that the real values are bounded above by some constant. How can I estimate the value of that constant, and the uncertainty on the estimate?

Thanks
 
Physics news on Phys.org
  • #2
You'll get better advice if you state the problem precisely - and you'll get better advice in the mathematical section if you state what you mean by "uncertainty".

In physical measurements, one scenario is to assume each measurement is an independent measurement of the same value [itex] v_0 [/itex] and that the result of the measurement is [itex] v0 + X [/itex] where X is a gaussian random variable with mean 0 and standard deviation [itex] \sigma [/itex]. The value of [itex] \sigma [/itex] is often called the "uncertainty" of the measurement. In some scenarios, measurements are made with equipment where the "uncertainty" is given by the manufacturer of the measuring device or a calibration lab. In other scenarios, the exact value of [itex] \sigma [/itex] is not known and what you have is only the standard deviation estimated from a specific set of data (i.e. a "sample standard deviation" rather than a "population standard deviation").

There can also be scenarios where "uncertainty" is specified by a rigid bound that is not interpreted as a standard deviation.
 
  • #3
Okay, assume I have a number of different data points with true values [itex]v_1, v_2, ...[/itex]. What I actually measure is [itex]v_1+X_1,v_2+X_2,...[/itex], where the X are gaussian random variables with mean 0 and SD of [itex]\sigma_1, \sigma_2, ...[/itex], just like youn say. It is known that there is some [itex]v_{max}[/itex] such that all [itex]v_i<v_{max}[/itex]. Many of the [itex]v[/itex] can be significantly less than that, and some of my measurements might actually be greater than [itex]v_{max}[/itex] because of random chance. I would like to know how I can estimate [itex]v_{max}[/itex] and get confidence intervals on that estimate. Thanks!
 
  • #4
So you are asking is how to estimate an upper bound on some function of several variables if the error of each variable is known? There is an engineer's rule of thumb that "if variables add, then errors add, if errors multiply, then the relative errors add".

That can be shown by thinking of the errors as differentials: x+ dx. If f= x+ y+ z, then df= dx+ dy+ dz. If f= xyz, then df= yzdx+ xzdy+ xydz. Since f= xyz, df/f= (yzdx+ xzdy+ xydz)/xyz= dx/x+ dy/y+ dz/z.

More generally, if f is a function of x, y, and z, then [itex]df= \frac{\partial f}{\partial x}dx+ \frac{\partial f}{\partial y}dy+ \frac{\partial f}{\partial z} dz[/itex]
 
  • #5
Is this a problem where you are trying to estimate both which [itex] i [/itex] has the maximum [itex] v_i [/itex] as well as the value of [itex] v_i [/itex]? For example, if you had the measurements of distances to, say, cities, and you wanted to travel to the city that was farthest away, you would want to estimate both the distance to the city and which city was farthest.

Being a Bayesian, I think it would be nice if you could imagine that the [itex] v_i [/itex] are selected from some probability distribution and then that the measurements [itex] v_i + X_i [/itex] are performed. Does a model like that makes sense for the problem?
 
  • #6
TheBigH said:
Okay, assume I have a number of different data points with true values [itex]v_1, v_2, ...[/itex]. What I actually measure is [itex]v_1+X_1,v_2+X_2,...[/itex], where the X are gaussian random variables with mean 0 and SD of [itex]\sigma_1, \sigma_2, ...[/itex], just like youn say. It is known that there is some [itex]v_{max}[/itex] such that all [itex]v_i<v_{max}[/itex]. Many of the [itex]v[/itex] can be significantly less than that, and some of my measurements might actually be greater than [itex]v_{max}[/itex] because of random chance. I would like to know how I can estimate [itex]v_{max}[/itex] and get confidence intervals on that estimate. Thanks!


It seems to me that you need to have some idea of what the values of those sigmas are. I also want to know how much of those sigmas is natural variation and how much is measurement error. Without some idea about these things I don't see how to proceed.

It is also worth noting that if you have an upper bound then you don't have a Gaussian. Maybe it is some sort of truncated Gaussian.
 
  • #7
As I understand TheBigH's statement, the [itex] \sigma[/itex]'s are known and the [itex] X_i [/itex] are the only source of variation.
 
  • #8
Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.
 
  • #9
That's a reasonably clear example, but we need to examine the statement:

TheBigH said:
I'm trying to estimate the maximum possible height a tree can reach

1. One problem is to estimate the maximum height of a tree in the particular forest where you measured all the trees.

2. A different problem imagines that your measured forest is just one example from the population of all possible forests and you want to know the maximum height of a tree that can occur in the population of all those forests.

3. A different problem imagines that your measured forrest is one example of a particular forest at a particular time and you want to know the maximum height a tree can reach as the forest changes over a long span of time.
 
  • #10
TheBigH said:
Obviously I'm still being unclear, so I'll give an example. Imagine I'm trying to estimate the maximum possible height a tree can reach by measuring the heights of all the trees in a forest. I can't measure their heights perfectly, so there will be some uncertainty which I assume to be normal with SD known for each tree. Many trees won't be anywhere near the maximum height. In the end, I expect a few measured heights to be a little bit above my calculated maximum just due to uncertainties and chance.

Aha. So the SD should be about the same for each tree. That makes it a lot easier. You seem to have the idea that you measure each tree a number of times and get a sample SD that way. You can do that, but I might assume that the SD of the measurements was always the same and average the sample variances (NOT the SDs, that's invalid).

I'm not sure what to do next. If you have one outlying maximum point then you'd add 1.96SD to that. If you have a large number of points that are close to the maximum then maybe you'll have to assume some things about the underlying distribution. It matters how the distribution cuts off. The only case where a measurement could be higher than the bound is if the distribution truncates sharply so you have a large number of points near the maximum. You also could have this problem if the measurement errors are large. If you don't want to make assumptions then I don't see how to do it. There might be some sort of parametric statistic that will help.
 
Last edited:
  • #11
I don't know how to solve the problem and it looks like a solution would be the material for a graduate thesis (perhaps one that's already written). However, finding wild ideas to try is no problem.

The way to deal with N things, is usually to first try 2 things. So let's assume we have two random variables
[itex] Y_1 = v_1 + X_1 [/itex]
[itex] Y_2 = v_2 + X_2 [/itex]
Where [itex] v_1, v_2 [/itex] are the unknown actual height of two trees.
[itex] X1,X2 [/itex] are independent zero mean normally distirbuted random variables with respective standard deviations [itex] \sigma_1, \sigma_2 [/itex]

Thinking like a Bayesian, my intuition is that if the prior distribution for [itex] v_1 [/itex] is taken to be a uniform distribution over a very large interval, then the posterior distribution for the location of [itex] v_1 [/itex] given the measurement [itex] Y_1 = y_1 [/itex] is approximately a normal distribution with mean [itex] y_1 [/itex] and standard deviation [itex] \sigma_1 [/itex]. Similarly, the posterior distribution for [itex] v_2 [/itex] given [itex] Y_2 = y_2 [/itex] is approximately a normal distribution with mean [itex] v_2 [/itex] and standard deviation [itex] \sigma_2 [/itex]

If we can find the posterior distribution of [itex] V_{max} = max(v_1,v_2) [/itex] then we can estimate [itex] V_{max} [/itex] using maximum likelihood or some other standard method.
 
  • #12
In the previous post, I should have said posterior distributions have means [itex] y_1 [/itex] and [itex] y_2 [/itex] instead of saying they were the unknowns [itex] v_1 [/itex] and [itex] v_2 [/itex].

[itex] max(v_1,v_2)[/itex] ( given [itex] Y1=y1 [/itex] and [itex] Y_2=y_2 [/itex]) is the second "order statistic" of two independent, but not identically distributed random variables. So we can look up the theory on it.

The generalization to [itex] max(v_1,v_2,...v_n) [/itex] , which is the n_th order statistic of a set of independent random variables is given by the Bapat-Beg theorem. http://en.wikipedia.org/wiki/Bapat–Beg_theorem

To make progress on the practical problem, what is needed is a way to approximate the result of the Bapat-Beg theorem or at least a way to approximate the mean and variance of the n_th order statistic of a set of indpendent, not identically distributed, normal random variables.
 
Last edited:

1. What is the purpose of estimating an upper bound from measurements with uncertainties?

The purpose of estimating an upper bound from measurements with uncertainties is to determine the maximum possible value for a measured quantity, taking into account the uncertainties or errors in the measurement. This allows scientists to have a more accurate understanding of the true range of values for a particular measurement.

2. How do you calculate an upper bound from measurements with uncertainties?

To calculate an upper bound from measurements with uncertainties, you need to first determine the maximum possible value for each individual measurement by adding the measurement value to the uncertainty value. Then, add all of these maximum values together to get the overall upper bound.

3. What are some common sources of uncertainties in measurements?

Some common sources of uncertainties in measurements include limitations of the measuring instrument, human error in recording measurements, and natural variations in the quantity being measured. Other factors such as environmental conditions and experimental procedures can also contribute to uncertainties.

4. How can you reduce uncertainties in measurements?

To reduce uncertainties in measurements, scientists can use more precise measuring instruments, carefully calibrate equipment, and repeat measurements multiple times to account for natural variations. It is also important to follow standardized procedures and minimize external factors that could affect the measurement.

5. Why is it important to consider uncertainties in measurements?

Considering uncertainties in measurements is important because it allows scientists to have a more accurate understanding of the true range of values for a particular measurement. This helps to avoid making false conclusions or assumptions based on a single measurement and promotes more reliable and accurate scientific results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
449
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
930
  • Set Theory, Logic, Probability, Statistics
Replies
17
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Back
Top