Probably (yea I know, hilarious) easy qs about Bin and Normal distributions

Click For Summary
SUMMARY

This discussion focuses on calculating the probability difference between two binomial distributions, specifically for cabin hooks tested at 25kN. The original sample sizes are n_1=107 and n_2=92, with successful hooks y_1=84 and y_2=12, respectively. The estimated probabilities p_1 and p_2 are derived from the binomial distributions and approximated using normal distributions. Key questions raised include the treatment of standard deviations in stochastic variables, the rationale behind using a 97.5% confidence level for a 95% interval, and the appropriateness of binomial distributions for this analysis.

PREREQUISITES
  • Understanding of Binomial distributions and their properties
  • Familiarity with Normal approximation techniques for binomial distributions
  • Knowledge of confidence intervals and hypothesis testing
  • Basic statistical concepts such as mean, standard deviation, and probability
NEXT STEPS
  • Study the Central Limit Theorem and its application to binomial distributions
  • Learn about confidence intervals and how to calculate them for different distributions
  • Explore the concept of stochastic variables and their properties in statistical analysis
  • Investigate Python libraries for statistical analysis, such as SciPy and NumPy, to perform simulations
USEFUL FOR

This discussion is beneficial for statisticians, data analysts, and engineers involved in quality control and reliability testing, particularly those working with binomial data and probability assessments.

mathpariah
Messages
3
Reaction score
0
hey

Gonna get straight to the point. I need to establish the probability difference between two probabilities p_1 and p_2 at 95%. Its the two probabilites that a cabin hook will hold for a certain force (25kN).

two samples, each with sizes "the originals" n_1=107, "cheap pirated ones" n_2=92. y_i are the amount of hooks which managed to successfully keep their acts together at 25kN for resp. sample

y_1=84 of which Y_1 is Bin(107,p_1) ≈ N(107*p_1, sqrt(107*p_1*q_1)) where q_1=1-p_1

and

y_2=12 of which Y_2 is Bin(92, p_2) ≈ N(92*p_2, sqrt(92*p_2*q_2))

now p_1 and p_2 can be estimated as follows:

^p_1 = y_1/107 which is an observation from ^P_1 ≈ N(p_1, sqrt((p_1*q_1)/107)

and ^p_2=y_2/92 obs ^P_2 ≈ N(p_2, sqrt((p_2*q_2)/92)

which gives us the estimated probability difference of:

^P_1-^P_2 ≈N(p_1-p_2, sqrt((p_1*q_1/107)+(p_2*q_2/92)))

which means you get the variable:

(^P_1-^P_2-(p_1-p_2))/sqrt((^P_1*^Q_1/107)+(^P_2*^Q_2/92)) ≈ N(0,1)

closing this stuff with z=z_0.975=1.96 from the normal dist. table gives me

INTERVAL_(p_1-p_2) = (^p_1-^p_2 -+ 1.96*sqrt((^p_1*^q_1/107)+(^p_2*^q_2/92)))=(0.55, 0.76)

This solution is supposedly correct and all I need is someone to help me understand the following:


1. When you subtract two stocastic variablesyou subtract the expected values from each other which I get, but the standard deviation is different... it becomes the sqrt of the sum of the independent stocastic variables standard deviations?

2. Why is the standard dev of the stocastic variable ^P_1 sqrt((p_1*q_1)/n_1)? can't you just call it σ_1? Are they the same? And is that always the case? Probability of something happening times 1- that probability through the sample size is equal to σ^2?

3. Why do you use the 97.5% probability when the question originally stated 95%? I know you use 1-α/2 to get there but I've never understood WHEN you can just go with 95% and when you have to use 97.5, for F distributions it seems going with 95% is ok even with 2 samples

4. Why do you use a Binomial distribution for this kind of problem?


would be pretty much amazing if anyone could help out with any or all of these questions, I am a donkey when it comes to math stat. thanks
 
Physics news on Phys.org
I am working on the same kinds of questions in my sigfig thread. I'm not getting help from people more familiar with the information either.

As to question 1.

Take a random variable with a mean of 36 and a standard deviation of 1: eg: 36(1)
If two data points are selected from the same variable, we might get:
a=36+1, b=36+2

By definition "a" is a 1 sigma deviation, and b is a 1.5 sigma deviation.
p(>=a) is ~= 15.87
p(>=b) is ~= 6.68

The resulting deviation is going to be: 2.5 and the resulting mean would be: 72
Hence there is a probability of at least: 15.87 * 6.78 of getting a deviation of 2.5 or MORE in the result. eg: a probability of around ~1%
But the chances of us *sampling* from the original variable a data point which is 2.5 or MORE deviaitons away is only 0.62% (much less likely.)

Hence, the result is *more* likely to have deviations of 2.5 sigmas away from the mean than either of the original data points.

The actual probability or the sum will be higher than I have listed; for data points (for example) having a deviation of 1.25 are much more likely than getting one of 1.5; and a deviaton of 1.25 + 1.25 is still 2.5; so that means there are many possibilities of getting the resulting SUM that I have excluded arbitrarily.

I like to think of stochastic/random variables as having a constant mean, and a random variation. The square root in the addition, I think, has to do with the idea of a "random walk" and is not a perfect estimator of the new deviation in all cases of error propagation.
eg: in multiplication instead of addition, there is a definite problem with the typical formulas for error propagation and which I am exploring now in the sigfig calculator thread.

I am needing to do a little work on that, so I won't attempt to answer questions 2+ of your thread at this time; I need to refresh my memory on these points anyhow -- and I am not getting any more help in my sigfig thread than you are here... (at least, yet.)

There is Python source code in that thread that may be helpful in setting up some quick "what if" experiments for yourself to test your ideas our numerically. If you need some help getting started with Python, or getting Python (it's free), don't let that deter you from trying it out -- I and many others can surely help. You can delete the parts of the python program which you don't need, or just don't use them.
:smile:
 

Similar threads

  • · Replies 12 ·
Replies
12
Views
3K
Replies
1
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
Replies
2
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
9
Views
2K
  • · Replies 105 ·
4
Replies
105
Views
15K
  • · Replies 7 ·
Replies
7
Views
3K