How to work out expected frequency from normal distribution

Click For Summary
The expected frequency for each interval of trains is calculated using the normal distribution, adjusting for continuity by using intervals like (59.5, 62.5) instead of (60, 62). This approach yields a more accurate expected frequency, as demonstrated with the first interval where the calculation results in approximately 4.178, close to the tabulated value of 4.13. The discussion emphasizes that expected frequencies do not need to be whole numbers, even for integer-valued random variables, and can be expressed as non-integer quantities. Accurate computations can vary based on the tools used, potentially leading to slight discrepancies. Understanding these nuances is crucial for correctly interpreting expected frequencies in statistical analyses.
question dude
Messages
80
Reaction score
0
attachment.php?attachmentid=455065&d=1440381068.jpg
How is the expected frequency column worked out for each interval of trains?

2) My attempt

Take the first interval, 60 - 62, I thought about doing this:

(62 - mean) / standard deviation

(62 - 67.45) / 2.92 = - 1.866

using Z score < - 1.886, from the normal distribution table, I get:

1 - 0.9686 = 0.0314

0.0314*(100) = 3.14

please note 100 is the total observed frequency

As you can see, I get 3.14 instead of 4.13 as given in the expected frequency column.
 
Physics news on Phys.org
question dude said:
attachment.php?attachmentid=455065&d=1440381068.jpg
How is the expected frequency column worked out for each interval of trains?

2) My attempt

Take the first interval, 60 - 62, I thought about doing this:

(62 - mean) / standard deviation

(62 - 67.45) / 2.92 = - 1.866

using Z score < - 1.886, from the normal distribution table, I get:

1 - 0.9686 = 0.0314

0.0314*(100) = 3.14

please note 100 is the total observed frequency

As you can see, I get 3.14 instead of 4.13 as given in the expected frequency column.

I don't get your answers; I don't get the tabulated expected frequencies, either, but I come close to the latter.

The number of trains is integer-valued (i.e, whole numbers) but you are approximating its distribution by a continuous distribution (the normal). So, the statement {60 ≤ trains ≤ 62} is the same as {59.5 ≤ trains ≤ 62.5} for actual, physical trains. If you use the normal distribution on the interval (59.5,62.5) you will get an expected frequency of 100* 0.04178 ≈ 4.178, which is not that far from the tabulate value of 4.13. For the interval (63 → 65 ) = (62.5 → 65.5) I get an expected frequency of 100 * 0.2071 = 20.71, which is close to the tabulated 20.68.

I used Maple to do accurate computations; if the tabulator used cruder tools he/she could get less accurate answers.

BTW: in goodness-of-fit tests we do NOT usually round off the "expected frequencies" to whole numbers; for the first cell we would typically leave it as 4.178 (or maybe 4.18, or maybe 4.2). The reason for this is that there is no reason at all to assume the expected frequencies to be integers. This has nothing at all to do with whether or not the distribution is discrete (for whole numbers) or for a continuous (like the normal): the expected cell frequency for an integer-valued random variable can --- and usually is --- a non-integer quantity.
 

Similar threads

Replies
2
Views
5K
  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 4 ·
Replies
4
Views
27K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 8 ·
Replies
8
Views
4K
  • · Replies 1 ·
Replies
1
Views
7K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
4K