mathmari said:
I have a question. At the intervals is it correct that the upper bound of the one is equal to the lower bound of the next interval or should it be the next number?
If the next interval starts at the next number, doesn't that mean we have 'gaps' between the intervals?
Whatever we do, there must not be gaps!
The classes must cover all possible values. And yes, that means there is some ambiguity at the boundaries.
Different conventions are used here.
If we are talking about integers, it is quite common that upper bounds are 1 less than the next lower bound.
This also happens with age groups.
So we might have for instance age groups 18-24, 25-29, 30-34. Note that in this case age 24 also covers people that are 1 day before their 25th birthday. (Nerd)
If we are talking about real numbers, the lower boundaries must be equal to the upper boundaries, since otherwise there would be gaps.
Of course we have a problem now with a number that is exactly on a boundary. Which interval should it belong to? (Wondering)
Then we need to make a consistent choice to either put the number either in the interval below, or the interval above.
The classes are then for instance [1.1, 2.2), [2.2, 3.3), [3.3, 4.4), [4.4, 5.5].
This is more explicit than writing 1.1-2.2, 2.2-3.3, 3.3-4.4, 4.4-5.5, which does not address the ambiguity.
Note that different programs use different conventions.
Excel identifies each class with the upper bound of the corresponding interval, and additionally introduces the extra class
'Larger'.
So with bins 1.1, 2.2, 3.3, 4.4, 5.5, we get the classes ($-\infty$,
1.1], (1.1,
2.2], (2.2,
3.3], (3.3,
4.4], (4.4,
5.5],
Larger. (Nerd)
Btw, if we are talking about continuous probability distributions, the chance that a value is exactly on a boundary is supposedly infinitely small (up to machine precision), so there should be no need to worry about it too much. (Whew)