Optimal linear combination of independent estimates for minimizing variance

  • Thread starter Thread starter Andre
  • Start date Start date
  • Tags Tags
    Statistics
AI Thread Summary
The discussion revolves around determining the optimal average age of an event based on various independent estimates with different methods and associated uncertainties. The user attempts to combine these estimates using normal distributions and calculates standard deviations based on provided data. They arrive at an average of 13220 years with a standard deviation of 20, refining it to 13218 ± 14 years through a least squares method. The user questions the validity of the approach and whether to inform the author of one record about the potential outlier nature of their method. The conversation emphasizes the importance of understanding linear combinations of independent estimates to minimize variance in statistical analysis.
Andre
Messages
4,294
Reaction score
73
edit done

I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

Homework Statement



The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

Record 3 is reported as 13215 counted years with a 1% error

Record 4 is calculated etc giving a result of 13195 years with σ=35

So what would be a realistic average value with realistic σ?

Homework Equations



Normal distribution.-

The Attempt at a Solution



I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx

I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.



Does this make sense?

The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

Does that make sense too, since I'm working with 5 year intervals?

Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?
 
Last edited by a moderator:
Physics news on Phys.org
Here is a simple theoretical problem which points to one possible approach.

Suppose X_1 and X_2 are two independent estimates of a random variable X, with standard deviations \sigma_1 and \sigma_2, respectively. Consider a linear combination of X_1 and X_2, i.e.
Y = \lambda_1 X_1 + \lambda_2 X_2 where
0 \leq \lambda_i \leq 1 and \lambda_1 + \lambda_2 = 1

If we assume that X_1 and X_2 are unbiased estimators of X, i.e. E(X_i) = E(X), then you should find it easy to show that the mean of Y is equal to the mean of X.

What choice of \lambda_1 and \lambda_2 minimizes the variance of Y?

With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.
 
Last edited:
I tried to combine those 2 formulas but it didn't work. I tried using another case where there are 2 red balls and 2 blue balls only so when combining the formula I got ##\frac{(4-1)!}{2!2!}=\frac{3}{2}## which does not make sense. Is there any formula to calculate cyclic permutation of identical objects or I have to do it by listing all the possibilities? Thanks
Since ##px^9+q## is the factor, then ##x^9=\frac{-q}{p}## will be one of the roots. Let ##f(x)=27x^{18}+bx^9+70##, then: $$27\left(\frac{-q}{p}\right)^2+b\left(\frac{-q}{p}\right)+70=0$$ $$b=27 \frac{q}{p}+70 \frac{p}{q}$$ $$b=\frac{27q^2+70p^2}{pq}$$ From this expression, it looks like there is no greatest value of ##b## because increasing the value of ##p## and ##q## will also increase the value of ##b##. How to find the greatest value of ##b##? Thanks

Similar threads

Back
Top