Optimal linear combination of independent estimates for minimizing variance

Andre · May 15, 2013

edit done

I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

Homework Statement

The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

Record 3 is reported as 13215 counted years with a 1% error

Record 4 is calculated etc giving a result of 13195 years with σ=35

So what would be a realistic average value with realistic σ?

Homework Equations

Normal distribution.-

The Attempt at a Solution

I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx

I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.

Does this make sense?

The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

Does that make sense too, since I'm working with 5 year intervals?

Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?

awkward · May 15, 2013

Here is a simple theoretical problem which points to one possible approach.

Suppose X_1 and X_2 are two independent estimates of a random variable X, with standard deviations \sigma_1 and \sigma_2, respectively. Consider a linear combination of X_1 and X_2, i.e.
Y = \lambda_1 X_1 + \lambda_2 X_2 where
0 \leq \lambda_i \leq 1 and \lambda_1 + \lambda_2 = 1

If we assume that X_1 and X_2 are unbiased estimators of X, i.e. E(X_i) = E(X), then you should find it easy to show that the mean of Y is equal to the mean of X.

What choice of \lambda_1 and \lambda_2 minimizes the variance of Y?

With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.

Optimal linear combination of independent estimates for minimizing variance

Homework Statement

Homework Equations

The Attempt at a Solution

Thread 'Finding the number of ways to arrange identical balls in a circle (3 different colors)'

Thread 'Greatest possible value of a constant in polynomial'

Similar threads

Hot Threads

Geometry: Similar Shapes

[ASK] Trigonometric Inequality

What does this equation mean?

Finding polar equation of a shifted cricle

Intersection of a circle and a sine curve

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem