Statistics not-homework

1. May 15, 2013

Andre

edit done

I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

1. The problem statement, all variables and given/known data

The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

Record 3 is reported as 13215 counted years with a 1% error

Record 4 is calculated etc giving a result of 13195 years with σ=35

So what would be a realistic average value with realistic σ?

2. Relevant equations

Normal distribution.-

3. The attempt at a solution

I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx [Broken]

I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.

[Broken]

Does this make sense?

The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

Does that make sense too, since I'm working with 5 year intervals?

Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?

Last edited by a moderator: May 6, 2017
2. May 15, 2013

awkward

Here is a simple theoretical problem which points to one possible approach.

Suppose $X_1$ and $X_2$ are two independent estimates of a random variable $X$, with standard deviations $\sigma_1$ and $\sigma_2$, respectively. Consider a linear combination of $X_1$ and $X_2$, i.e.
$$Y = \lambda_1 X_1 + \lambda_2 X_2$$ where
$$0 \leq \lambda_i \leq 1$$ and $$\lambda_1 + \lambda_2 = 1$$

If we assume that $X_1$ and $X_2$ are unbiased estimators of $X$, i.e. $E(X_i) = E(X)$, then you should find it easy to show that the mean of $Y$ is equal to the mean of $X$.

What choice of $\lambda_1$ and $\lambda_2$ minimizes the variance of $Y$?

With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.

Last edited: May 15, 2013