An elementary confusion on discrete or continuous variable

ssd · Jul 13, 2018

The question is simply posed as " identity the variables as discrete or continuous. 1) Mark of a student in an examination. 2) Family income."
What I think:
1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.

My confusion is, my daughter's college professor in their introductory class of Statistics uniquely states both the variables as continuous.

StoneTemplePython · Jul 13, 2018

I think it's a touch ambiguous.

Consider income. This is typically thought of as a real valued (read: not just rationals) random variable or parameter in a model. Strictly speaking every income I've heard of is countable -- in the US, simply in dollars and cents. But for statistical modeling you tend to think of it as real valued. Having access to ##\mathbb R## makes it a bit easier to model in a regression -- no issues with taking a 2 norm or what have you. Not a totally satisfying answer perhaps.

For grading, I mean it depends on how the graders are given. Certainly this is discrete if no partial credit is given and tests are scored in a remotely familiar way as Correct or Incorrect (read: Bernouli aka 0-1 proposition). However partial credit is presumably allowed per question, which is interpretted as being real valued. I don't actually really think you'd have irrationals for partial credit, but in principle I couldn't rule it out. And again, it makes modeling a bit easier for regression.

- - -
edit: for avoidance of doubt: for the no partial credit case, I meant 0-1 scoring per question. Most tests I've seen allow for some partial credit per question.

ssd · Jul 13, 2018

Agree with you. For all advanced studies or inference they are taken as continuous without problem. But, in the introductory discussion one may allow the other aspects of the idea to have a complete picture.

Dale · Jul 13, 2018

ssd said:

1) There must be a minimum gap between two possible consecutive marks that the examiner can assign. Eg. Suppose that there are N students and we are considering percentile scores. Then the scores can take values on N isolated points only, 100*1/N, 100*2/N,..., 100*N/N. Implying that scores are discrete. Again, if we assume that scores are sample observations from a merit distribution, and the distribution is continuous, then we will have a different answer.
2) Family income cannot be more accurate than the smallest currency, and hence discrete.

By this logic every number you could enter into a computer would be discrete. There is always a smallest difference in computer numbers. Similarly any measuring device has a smallest detectable difference.

Many things in statistics are not about some underlying reality, but about your model. They are arbitrary choices that the statistician can make to model the problem, some choices are better than others.

In this case, both grades and income should clearly be treated as continuous. For both it would make sense to assume that they are normally distributed rather than some discrete distribution. Deviations from normality are likely to be due to skew rather than due to discretization. Assuming continuous variables opens up a lot of statistical methods that are both relevant and powerful.

The teacher’s choice is clearly the right choice and it is the choice that any reasonable statistician would make. It is a judgement call, and the teacher is teaching the students which is the right call in this case.

ssd · Jul 13, 2018

I was trying to go by analyzing the definition. The question is not intended for the answer that " what should we assume for convenience of inference making". Logic, yes it is the base of my understanding. Photons exist as very small discrete packets, solving and generating big questions. Apparently, light , 'chosen' as continuous waves simplified many things at one point time. For that matter, no model is really arbitrary in Statistics. Nor that, something (say some model) in Statistics exists without connection to underlying reality (as derivations shown in textbooks up to the post graduate level, so far as I have seen). We only make some assumptions in our models so that we can mathematically handle it. Following researchers relax the assumptions one by one to move closer to reality.

Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.

StoneTemplePython · Jul 13, 2018

ssd said:

Coming back to the original question: I give one example, which will further explain my question. There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete.
My point was, without proper explanation of marking process, it cannot be stamped apriori as continuous.

I really cringe at the idea of calling the sum (convolution) of a finite number of Bernouli's as continuous. Bernoulis (coin tosses) are sort of the canonical example of discrete. Of course is there are a large number involved we can approximate sums it with a Gaussian.

I think the key point is that this is a statistics course not a probability course. Maybe it's better to just tweak the question, and instead of asking whether something is discrete or continuous, just ask -- can we model it as continuous? If no, then why not?

Btw, in an awful lot of cases the "continuous vs discrete" label is kind of a misnomer -- what people are really getting at is something closer to cardinal vs ordinal data.

ssd · Jul 13, 2018

Sounds absolutely meaningful.

Dale · Jul 13, 2018

ssd said:

There are tests for large number of students in some country, where 5 alternatives answers are given to each question. A student has to blacken a circle corresponding to the right answer. 1 point is given for the right answer and 0 point for the wrong. An optical device reads the answers and shows the mark obtained. This mark obviously is discrete

No, even this would be best handled as a continuous variable. You would generally model the performance of a student on the test as a normally distributed random variable, meaning it is continuous. Hence common terms like “grading on the curve”.

Furthermore, tests are often seen as an instrument that measures some underlying trait of interest. That trait is frequently continuous, despite the finite resolution of the measurement instrument.

ssd · Jul 14, 2018

How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition. You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous. You are focusing on inference making or graduation. Even then we do not get a normal variate from a binomial "generally". The conditions of De Moivre - Laplace limit theorem have to hold.

FactChecker · Jul 14, 2018

To an obsessive purest, both are discrete. But it is really a matter of convenience. It depends on whether a person wants to handle the probabilities one-by-one as descrete values or in ranges. If the student scores are "A, B, C, D, F", then virtually everyone would treat them as discrete, but if they are integers 0..100, then virtually everyone would treat them as continuous ranges.

Dale · Jul 14, 2018

ssd said:

How we may handle the data is irrelevant in context of the question, as mentioned earlier. The question is not " what you would generally assume". The question is 'what is what' by definition.

You and your daughter were trying to understand why the professor answered that they are continuous. In which case your approach is clearly wrong since it comes to the wrong conclusion.

If you wish to try to justify your/her mistake then by all means proceed with your approach. However, the professor is teaching statistics, and is right to treat both variables as continuous in a statistics class. So if your daughter wishes to learn statistics and improve her scores on future assignments then she needs to take a different focus than what you are suggesting.

The professor is clearly not asking the question you are insisting should be discussed. You are doing your daughter a disservice by trying to get justification for your answer instead of trying to understand the professor’s answer.

ssd said:

You 'assume' a variable to be distributed normally does not imply the variable is continuous, you just assumed it to be continuous.

On the contrary, a variable isn’t something that exists in the real world. It is a part of your model, so if you define it to be continuous by your model’s assumption then it is in fact continuous.

The question you are getting at is whether or not the model thus defined is a good model, meaning does it accurately predict the results of experiments and is it simple. Treating test scores as continuous usually simplifies the model, and produces results that are as accurate as a discrete assumption. Thus it is the better choice, as indicated by the professor.

I am firmly in agreement with the professor on this. I wish your daughter the best of luck.

EngWiPy · Jul 14, 2018

They are continuous. Continuous means they can take on any value withing their support. Your assumption that grades take discrete values of step size 100/N isn't accurate or justifiable. Suppose you have 25 students in a class, with a total grade of 100. These 25 students can have grades between 0 and 100, and not necessarily any of the grades 0, 4, 8, ..., 100, only. Probably they would follow a normal distribution with a mean that equals the average grade. If you divide the grades into categories like A, B, C, D, E, and F, then yes, it is a discrete random variable.

An elementary confusion on discrete or continuous variable

Related to An elementary confusion on discrete or continuous variable

1. What is the difference between discrete and continuous variables?

2. How can I identify if a variable is discrete or continuous?

3. Can a variable be both discrete and continuous?

4. Why is it important to understand the difference between discrete and continuous variables?

5. What type of variable is time?

Similar threads

Hot Threads

Recent Insights