Optimal linear combination of independent estimates for minimizing variance

  • Thread starter Thread starter Andre
  • Start date Start date
  • Tags Tags
    Statistics
Click For Summary
SUMMARY

The discussion centers on optimizing the linear combination of independent estimates to minimize variance when determining the age of an event based on multiple datasets. Four records are analyzed, with average ages of 13200, 13260, 13215, and 13195 years, each accompanied by their respective standard deviations. The user employs normal distribution principles and a spreadsheet to calculate a combined average of 13220 ± 20 years, refining it to 13218 ± 14 years through least squares. The user questions the validity of the method used by the author of record 2, suggesting it may lead to an outlier result.

PREREQUISITES
  • Understanding of normal distribution and standard deviation
  • Familiarity with linear combinations of random variables
  • Basic knowledge of least squares methods
  • Proficiency in spreadsheet software for data analysis
NEXT STEPS
  • Learn about the properties of linear combinations of independent random variables
  • Study the method of least squares for statistical estimation
  • Explore the implications of outlier detection in statistical datasets
  • Investigate advanced techniques for combining estimates, such as Bayesian methods
USEFUL FOR

Statisticians, data analysts, and researchers involved in estimating parameters from multiple datasets, particularly in fields requiring precise age determination from statistical data.

Andre
Messages
4,296
Reaction score
73
edit done

I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

Homework Statement



The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

Record 3 is reported as 13215 counted years with a 1% error

Record 4 is calculated etc giving a result of 13195 years with σ=35

So what would be a realistic average value with realistic σ?

Homework Equations



Normal distribution.-

The Attempt at a Solution



I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx

I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.



Does this make sense?

The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

Does that make sense too, since I'm working with 5 year intervals?

Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?
 
Last edited by a moderator:
Physics news on Phys.org
Here is a simple theoretical problem which points to one possible approach.

Suppose X_1 and X_2 are two independent estimates of a random variable X, with standard deviations \sigma_1 and \sigma_2, respectively. Consider a linear combination of X_1 and X_2, i.e.
Y = \lambda_1 X_1 + \lambda_2 X_2 where
0 \leq \lambda_i \leq 1 and \lambda_1 + \lambda_2 = 1

If we assume that X_1 and X_2 are unbiased estimators of X, i.e. E(X_i) = E(X), then you should find it easy to show that the mean of Y is equal to the mean of X.

What choice of \lambda_1 and \lambda_2 minimizes the variance of Y?

With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.
 
Last edited:

Similar threads

  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 18 ·
Replies
18
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K
  • · Replies 3 ·
Replies
3
Views
5K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 0 ·
Replies
0
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K