Optimal linear combination of independent estimates for minimizing variance

Andre · May 15, 2013

edit done

I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

Homework Statement

The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

Record 3 is reported as 13215 counted years with a 1% error

Record 4 is calculated etc giving a result of 13195 years with σ=35

So what would be a realistic average value with realistic σ?

Homework Equations

Normal distribution.-

The Attempt at a Solution

I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx

I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.

Does this make sense?

The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

Does that make sense too, since I'm working with 5 year intervals?

Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?

awkward · May 15, 2013

Here is a simple theoretical problem which points to one possible approach.

Suppose [itex]X_1[/itex] and [itex]X_2[/itex] are two independent estimates of a random variable [itex]X[/itex], with standard deviations [itex]\sigma_1[/itex] and [itex]\sigma_2[/itex], respectively. Consider a linear combination of [itex]X_1[/itex] and [itex]X_2[/itex], i.e.
[tex]Y = \lambda_1 X_1 + \lambda_2 X_2[/tex] where
[tex]0 \leq \lambda_i \leq 1[/tex] and [tex]\lambda_1 + \lambda_2 = 1[/tex]

If we assume that [itex]X_1[/itex] and [itex]X_2[/itex] are unbiased estimators of [itex]X[/itex], i.e. [itex]E(X_i) = E(X)[/itex], then you should find it easy to show that the mean of [itex]Y[/itex] is equal to the mean of [itex]X[/itex].

What choice of [itex]\lambda_1[/itex] and [itex]\lambda_2[/itex] minimizes the variance of [itex]Y[/itex]?

With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.

Optimal linear combination of independent estimates for minimizing variance

Homework Statement

Homework Equations

The Attempt at a Solution

Similar threads

The optimal way of dividing the bet three ways

"Critical" Triangle Problem

Hedging on a weather prediction

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect