An elementary confusion on discrete or continuous variable

Click For Summary
SUMMARY

The discussion centers on the classification of variables as discrete or continuous, specifically focusing on student examination marks and family income. Participants argue that student marks are discrete due to the finite scoring system, while family income is typically treated as continuous for statistical modeling despite being countable. The consensus is that in a statistics context, both variables should be treated as continuous to facilitate modeling and analysis, as this approach aligns with common statistical practices and simplifies inference.

PREREQUISITES
  • Understanding of discrete vs. continuous variables in statistics.
  • Familiarity with statistical modeling concepts, particularly normal distribution.
  • Knowledge of grading systems and their implications in statistical analysis.
  • Basic principles of regression analysis and inference in statistics.
NEXT STEPS
  • Research the implications of treating variables as continuous in statistical modeling.
  • Learn about normal distribution and its application in regression analysis.
  • Explore grading systems and their statistical interpretations in educational assessments.
  • Investigate the differences between cardinal and ordinal data in statistical contexts.
USEFUL FOR

Students of statistics, educators teaching statistical concepts, and data analysts involved in modeling and interpreting quantitative data.

ssd
Messages
268
Reaction score
6
The question is simply posed as " identity the variables as discrete or continuous. 1) Mark of a student in an examination. 2) Family income."
What I think:
1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.

My confusion is, my daughter's college professor in their introductory class of Statistics uniquely states both the variables as continuous.
 
Physics news on Phys.org
I think it's a touch ambiguous.

Consider income. This is typically thought of as a real valued (read: not just rationals) random variable or parameter in a model. Strictly speaking every income I've heard of is countable -- in the US, simply in dollars and cents. But for statistical modeling you tend to think of it as real valued. Having access to ##\mathbb R## makes it a bit easier to model in a regression -- no issues with taking a 2 norm or what have you. Not a totally satisfying answer perhaps.

For grading, I mean it depends on how the graders are given. Certainly this is discrete if no partial credit is given and tests are scored in a remotely familiar way as Correct or Incorrect (read: Bernouli aka 0-1 proposition). However partial credit is presumably allowed per question, which is interpretted as being real valued. I don't actually really think you'd have irrationals for partial credit, but in principle I couldn't rule it out. And again, it makes modeling a bit easier for regression.

- - -
edit: for avoidance of doubt: for the no partial credit case, I meant 0-1 scoring per question. Most tests I've seen allow for some partial credit per question.
 
Last edited:
  • Like
Likes   Reactions: ssd
Agree with you. For all advanced studies or inference they are taken as continuous without problem. But, in the introductory discussion one may allow the other aspects of the idea to have a complete picture.
 
ssd said:
1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.
By this logic every number you could enter into a computer would be discrete. There is always a smallest difference in computer numbers. Similarly any measuring device has a smallest detectable difference.

Many things in statistics are not about some underlying reality, but about your model. They are arbitrary choices that the statistician can make to model the problem, some choices are better than others.

In this case, both grades and income should clearly be treated as continuous. For both it would make sense to assume that they are normally distributed rather than some discrete distribution. Deviations from normality are likely to be due to skew rather than due to discretization. Assuming continuous variables opens up a lot of statistical methods that are both relevant and powerful.

The teacher’s choice is clearly the right choice and it is the choice that any reasonable statistician would make. It is a judgement call, and the teacher is teaching the students which is the right call in this case.
 
  • Like
Likes   Reactions: Klystron and fresh_42
I was trying to go by analyzing the definition. The question is not intended for the answer that " what should we assume for convenience of inference making". Logic, yes it is the base of my understanding. Photons exist as very small discrete packets, solving and generating big questions. Apparently, light , 'chosen' as continuous waves simplified many things at one point time. For that matter, no model is really arbitrary in Statistics. Nor that, something (say some model) in Statistics exists without connection to underlying reality (as derivations shown in textbooks up to the post graduate level, so far as I have seen). We only make some assumptions in our models so that we can mathematically handle it. Following researchers relax the assumptions one by one to move closer to reality.

Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.
 
ssd said:
Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.

I really cringe at the idea of calling the sum (convolution) of a finite number of Bernouli's as continuous. Bernoulis (coin tosses) are sort of the canonical example of discrete. Of course is there are a large number involved we can approximate sums it with a Gaussian.

I think the key point is that this is a statistics course not a probability course. Maybe it's better to just tweak the question, and instead of asking whether something is discrete or continuous, just ask -- can we model it as continuous? If no, then why not?

Btw, in an awful lot of cases the "continuous vs discrete" label is kind of a misnomer -- what people are really getting at is something closer to cardinal vs ordinal data.
 
Sounds absolutely meaningful.
 
ssd said:
There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete
No, even this would be best handled as a continuous variable. You would generally model the performance of a student on the test as a normally distributed random variable, meaning it is continuous. Hence common terms like “grading on the curve”.

Furthermore, tests are often seen as an instrument that measures some underlying trait of interest. That trait is frequently continuous, despite the finite resolution of the measurement instrument.
 
Last edited:
How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition. You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous. You are focusing on inference making or graduation. Even then we do not get a normal variate from a binomial "generally". The conditions of De Moivre - Laplace limit theorem have to hold.
 
  • #10
To an obsessive purest, both are discrete. But it is really a matter of convenience. It depends on whether a person wants to handle the probabilities one-by-one as discrete values or in ranges. If the student scores are "A, B, C, D, F", then virtually everyone would treat them as discrete, but if they are integers 0..100, then virtually everyone would treat them as continuous ranges.
 
  • #11
ssd said:
How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition.
You and your daughter were trying to understand why the professor answered that they are continuous. In which case your approach is clearly wrong since it comes to the wrong conclusion.

If you wish to try to justify your/her mistake then by all means proceed with your approach. However, the professor is teaching statistics, and is right to treat both variables as continuous in a statistics class. So if your daughter wishes to learn statistics and improve her scores on future assignments then she needs to take a different focus than what you are suggesting.

The professor is clearly not asking the question you are insisting should be discussed. You are doing your daughter a disservice by trying to get justification for your answer instead of trying to understand the professor’s answer.

ssd said:
You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous.
On the contrary, a variable isn’t something that exists in the real world. It is a part of your model, so if you define it to be continuous by your model’s assumption then it is in fact continuous.

The question you are getting at is whether or not the model thus defined is a good model, meaning does it accurately predict the results of experiments and is it simple. Treating test scores as continuous usually simplifies the model, and produces results that are as accurate as a discrete assumption. Thus it is the better choice, as indicated by the professor.

I am firmly in agreement with the professor on this. I wish your daughter the best of luck.
 
Last edited:
  • #12
They are continuous. Continuous means they can take on any value withing their support. Your assumption that grades take discrete values of step size 100/N isn't accurate or justifiable. Suppose you have 25 students in a class, with a total grade of 100. These 25 students can have grades between 0 and 100, and not necessarily any of the grades 0, 4, 8, ..., 100, only. Probably they would follow a normal distribution with a mean that equals the average grade. If you divide the grades into categories like A, B, C, D, E, and F, then yes, it is a discrete random variable.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 25 ·
Replies
25
Views
6K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 10 ·
Replies
10
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K