Continuity correction when using normal as approximation for binomial

Click For Summary

Discussion Overview

The discussion revolves around the application of continuity correction when approximating a binomial distribution with a normal distribution, particularly in cases where the variable takes non-integer values. Participants explore various scenarios and implications of applying continuity correction in these contexts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant questions whether the continuity correction for P(X < 1.2) should be P(X < 0.7) or P(X < 1.15), or if it should be something else entirely.
  • Another participant emphasizes the need for continuity correction due to the discrete nature of the binomial distribution when transitioning to a continuous normal distribution.
  • Participants discuss how the placement of other values (lines) affects the application of continuity correction, suggesting that intervals between values can influence the correction applied.
  • There is a consideration of whether discrete data can have irregular intervals, with some participants affirming that it is possible.
  • One participant proposes that for a specific set of discrete values, the continuity correction for P(X < 1.3) might be P(X < 1.2) and for P(X > 1.3) might be P(X > 1.35), while questioning the correction for P(X < 1.5).
  • Another participant suggests that the correctness of these continuity corrections depends on how the data is binned, indicating that variable width bins could change the application of corrections.

Areas of Agreement / Disagreement

Participants express various viewpoints on the application of continuity correction, with no clear consensus reached on the specific corrections to apply in different scenarios. The discussion remains unresolved regarding the best practices for continuity correction in cases with non-integer values.

Contextual Notes

Participants highlight the importance of understanding the nature of the data (discrete vs. continuous) and the implications of binning on the application of continuity corrections. There are unresolved questions about the assumptions underlying the placement of values and the intervals between them.

songoku
Messages
2,509
Reaction score
393
TL;DR
I have learnt how to do continuity correction when the value of random variable is integer, such as P(X < 5) changes to P(X < 4.5) when the distribution changes from binomial to normal
What if the value of X is not integer, such as P(X < 1.2)?

a) Will the continuity correction be P(X < 1.2 - 0.5) = P(X < 0.7)?

or

b) Will the continuity correction be P(X < 1.2 - 0.05) = P(X < 1.15)?

or

c) Something else?

Thanks
 
Physics news on Phys.org
Do you understand why we have to make the continuity correction for an integer (i.e. discontinuous) variable?
 
  • Like
Likes   Reactions: songoku
pbuk said:
Do you understand why we have to make the continuity correction for an integer (i.e. discontinuous) variable?
Because binomial distribution is discrete distribution so to change it to normal distribution (continuous distribution) there should be adjustment. I am thinking like changing a line of x = 4 (discrete) to a box where the left vertex of the box is 3.5 and right vertex is 4.5 so that the box can touch the other box made from line x = 3 and x = 5 (becoming continuous distribution)
 
So how does that apply when you have a line at 1.2?
 
  • Like
Likes   Reactions: songoku
pbuk said:
So how does that apply when you have a line at 1.2?
It means I have to know the location of other lines. I am trying to form a hypothetical question regarding this but I just can't think of one.

So if the location of other lines are 1.1 and 1.3, P(X < 1.2) will be P(X < 1.15) and P(X > 1.2) will be P(X > 1.25)

But if the location of other lines is not in regular intervals, such as one is at 1.1 and the other is at 1.4, then P(X < 1.2) will be P(X < 1.15) and P(X > 1.2) will be P(X > 1.3)?

Thanks
 
If your data is discrete so that it is only possible for a 'line' (whatever that is) to take values of 1.1, 1.2, 1.3, 1.4 etc. then a continuity correction may make sense, but if it the data are by nature continuous but there just happen to be some gaps then it would not make sense.
 
  • Like
Likes   Reactions: songoku
pbuk said:
If your data is discrete so that it is only possible for a 'line' (whatever that is) to take values of 1.1, 1.2, 1.3, 1.4 etc. then a continuity correction may make sense, but if it the data are by nature continuous but there just happen to be some gaps then it would not make sense.
I understand. I haven't encountered such questions. All the practice questions are about integers so my query is only due to curiosity.

Is it not possible to have discrete data with irregular intervals, such as 1.1 , 1.3 , 1.4 , 1.9?

Thanks
 
songoku said:
Is it not possible to have discrete data with irregular intervals, such as 1.1 , 1.3 , 1.4 , 1.9?
Of course it is, but the continuity correction is not about what values the data actually take, rather it is about what values the data can possibly take.
 
  • Like
Likes   Reactions: songoku
pbuk said:
Of course it is, but the continuity correction is not about what values the data actually take, rather it is about what values the data can possibly take.
So is it correct to say that if the data is 1.1 , 1.3 , 1.4 and 1.9 the continuity correction for P(X < 1.3) is P(X < 1.2) and for P(X > 1.3) is P(X > 1.35)?

What if for P(X < 1.5)? Since the midpoint of 1.4 and 1.9 is 1.65, would the continuity correction for P( X< 1.5) be P(X > 1.65)?

Thanks
 
  • #10
songoku said:
So is it correct to say that if the data is 1.1 , 1.3 , 1.4 and 1.9 the continuity correction for P(X < 1.3) is P(X < 1.2) and for P(X > 1.3) is P(X > 1.35)?
This would only be the case if the data was binned with variable width bins including (1.1, 1.3) and (1.3, 1.4).

songoku said:
What if for P(X < 1.5)? Since the midpoint of 1.4 and 1.9 is 1.65, would the continuity correction for P( X< 1.5) be P(X > 1.65)?
This would only be the case if the data was binned with variable width bins including (1.4, 1.9).

But unless there was a good reason for binning the data you would get a better result by using individual samples.

It is good to be curious, but I think you have got lost down a rabbit hole; it's time to move on.
 
  • Like
Likes   Reactions: songoku
  • #11
Thank you very much pbuk
 

Similar threads

  • · Replies 0 ·
Replies
0
Views
2K
  • · Replies 11 ·
Replies
11
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 3 ·
Replies
3
Views
5K
Replies
4
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 25 ·
Replies
25
Views
6K
  • · Replies 1 ·
Replies
1
Views
1K