1. Limited time only! Sign up for a free 30min personal tutor trial with Chegg Tutors
    Dismiss Notice
Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Homework Help: Statistics not-homework

  1. May 15, 2013 #1
    edit done

    I posted this here in an attempt to follow the rules for independent study. My knowledge of statistics is very rudimentary, so I would like to know if my approach does make any sense of not.

    1. The problem statement, all variables and given/known data

    The question is determine the age of a certain event based on a series of binominal/normal distibuted datasets, all looking for the same event age but with different methods.

    Record 1 is given as counted 10 times by different people leading to an average of 13200 years with σ= 20

    Record 2 is given as counted as 12985 certain years with 150 uncertain layers that may or may not be years. So the authors split the difference and conclude: 13260 years with an absolute error of 75 years

    Record 3 is reported as 13215 counted years with a 1% error

    Record 4 is calculated etc giving a result of 13195 years with σ=35

    So what would be a realistic average value with realistic σ?

    2. Relevant equations

    Normal distribution.-

    3. The attempt at a solution

    I wondered if it would work if I treated all record sets as normal distributions and then multiply all four of them for each data point. To get σ's for records 2 and 3 I used the absolute error as the 3σ range, getting values of 25 and 44 years respectively.

    Then I created this spreadsheet, using 5 year intervals, which is ample in the branch.

    https://dl.dropboxusercontent.com/u/22026080/numbers-crunching.xlsx [Broken]

    I just multiplied all data in the series of the four records (column I) and then corrected it to get a sum of 1 under the graph (column F). Colums J-K-L are just a help to find the average and σ which turns out to be 13220 ± 20.


    Does this make sense?

    The result is close to the 2σ boundary with record 2. Therefore I made a refinement tool (column G) to find the least squares (manual - trial and error) which turned out to be 13218 ± 14(?) years.

    Does that make sense too, since I'm working with 5 year intervals?

    Finally, does it make sense to mail the author of record 2 telling him that his method of splitting the difference between certain and uncertain years makes his result an outlier?
    Last edited by a moderator: May 6, 2017
  2. jcsd
  3. May 15, 2013 #2
    Here is a simple theoretical problem which points to one possible approach.

    Suppose [itex]X_1[/itex] and [itex]X_2[/itex] are two independent estimates of a random variable [itex]X[/itex], with standard deviations [itex]\sigma_1[/itex] and [itex]\sigma_2[/itex], respectively. Consider a linear combination of [itex]X_1[/itex] and [itex]X_2[/itex], i.e.
    [tex]Y = \lambda_1 X_1 + \lambda_2 X_2[/tex] where
    [tex]0 \leq \lambda_i \leq 1[/tex] and [tex]\lambda_1 + \lambda_2 = 1[/tex]

    If we assume that [itex]X_1[/itex] and [itex]X_2[/itex] are unbiased estimators of [itex]X[/itex], i.e. [itex]E(X_i) = E(X)[/itex], then you should find it easy to show that the mean of [itex]Y[/itex] is equal to the mean of [itex]X[/itex].

    What choice of [itex]\lambda_1[/itex] and [itex]\lambda_2[/itex] minimizes the variance of [itex]Y[/itex]?

    With just a tiny bit of statistics and calculus, you should be able to solve this problem. And then maybe you can then see a way to apply the result to your original problem.
    Last edited: May 15, 2013
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted