Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probably (yea I know, hilarious) easy qs about Bin and Normal distributions

  1. Feb 6, 2012 #1

    Gonna get straight to the point. I need to establish the probability difference between two probabilities p_1 and p_2 at 95%. Its the two probabilites that a cabin hook will hold for a certain force (25kN).

    two samples, each with sizes "the originals" n_1=107, "cheap pirated ones" n_2=92. y_i are the amount of hooks which managed to successfully keep their acts together at 25kN for resp. sample

    y_1=84 of which Y_1 is Bin(107,p_1) ≈ N(107*p_1, sqrt(107*p_1*q_1)) where q_1=1-p_1


    y_2=12 of which Y_2 is Bin(92, p_2) ≈ N(92*p_2, sqrt(92*p_2*q_2))

    now p_1 and p_2 can be estimated as follows:

    ^p_1 = y_1/107 which is an observation from ^P_1 ≈ N(p_1, sqrt((p_1*q_1)/107)

    and ^p_2=y_2/92 obs ^P_2 ≈ N(p_2, sqrt((p_2*q_2)/92)

    which gives us the estimated probability difference of:

    ^P_1-^P_2 ≈N(p_1-p_2, sqrt((p_1*q_1/107)+(p_2*q_2/92)))

    which means you get the variable:

    (^P_1-^P_2-(p_1-p_2))/sqrt((^P_1*^Q_1/107)+(^P_2*^Q_2/92)) ≈ N(0,1)

    closing this stuff with z=z_0.975=1.96 from the normal dist. table gives me

    INTERVAL_(p_1-p_2) = (^p_1-^p_2 -+ 1.96*sqrt((^p_1*^q_1/107)+(^p_2*^q_2/92)))=(0.55, 0.76)

    This solution is supposedly correct and all I need is someone to help me understand the following:

    1. When you subtract two stocastic variablesyou subtract the expected values from each other which I get, but the standard deviation is different... it becomes the sqrt of the sum of the independent stocastic variables standard deviations?

    2. Why is the standard dev of the stocastic variable ^P_1 sqrt((p_1*q_1)/n_1)? cant you just call it σ_1? Are they the same? And is that always the case? Probability of something happening times 1- that probability through the sample size is equal to σ^2?

    3. Why do you use the 97.5% probability when the question originally stated 95%? I know you use 1-α/2 to get there but Ive never understood WHEN you can just go with 95% and when you have to use 97.5, for F distributions it seems going with 95% is ok even with 2 samples

    4. Why do you use a Binomial distribution for this kind of problem?

    would be pretty much amazing if anyone could help out with any or all of these questions, Im a donkey when it comes to math stat. thanks
  2. jcsd
  3. Feb 13, 2012 #2
    I am working on the same kinds of questions in my sigfig thread. I'm not getting help from people more familiar with the information either.

    As to question 1.

    Take a random variable with a mean of 36 and a standard deviation of 1: eg: 36(1)
    If two data points are selected from the same variable, we might get:
    a=36+1, b=36+2

    By definition "a" is a 1 sigma deviation, and b is a 1.5 sigma deviation.
    p(>=a) is ~= 15.87
    p(>=b) is ~= 6.68

    The resulting deviation is going to be: 2.5 and the resulting mean would be: 72
    Hence there is a probability of at least: 15.87 * 6.78 of getting a deviation of 2.5 or MORE in the result. eg: a probability of around ~1%
    But the chances of us *sampling* from the original variable a data point which is 2.5 or MORE deviaitons away is only 0.62% (much less likely.)

    Hence, the result is *more* likely to have deviations of 2.5 sigmas away from the mean than either of the original data points.

    The actual probability or the sum will be higher than I have listed; for data points (for example) having a deviation of 1.25 are much more likely than getting one of 1.5; and a deviaton of 1.25 + 1.25 is still 2.5; so that means there are many possibilities of getting the resulting SUM that I have excluded arbitrarily.

    I like to think of stochastic/random variables as having a constant mean, and a random variation. The square root in the addition, I think, has to do with the idea of a "random walk" and is not a perfect estimator of the new deviation in all cases of error propagation.
    eg: in multiplication instead of addition, there is a definite problem with the typical formulas for error propagation and which I am exploring now in the sigfig calculator thread.

    I am needing to do a little work on that, so I won't attempt to answer questions 2+ of your thread at this time; I need to refresh my memory on these points anyhow -- and I am not getting any more help in my sigfig thread than you are here.... (at least, yet.)

    There is Python source code in that thread that may be helpful in setting up some quick "what if" experiments for yourself to test your ideas our numerically. If you need some help getting started with Python, or getting Python (it's free), don't let that deter you from trying it out -- I and many others can surely help. You can delete the parts of the python program which you don't need, or just don't use them.
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Similar Discussions: Probably (yea I know, hilarious) easy qs about Bin and Normal distributions