Count and Categorical Variables...

  • Context: Undergrad 
  • Thread starter Thread starter fog37
  • Start date Start date
  • Tags Tags
    Count Variables
Click For Summary

Discussion Overview

The discussion revolves around the nature of count and categorical variables, particularly in the context of statistical representation such as frequency tables and scatterplots. Participants explore the definitions, properties, and implications of using counts in various statistical contexts, including their representation in visualizations like histograms and scatterplots.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants propose that count can be considered a discrete non-negative numerical variable, while others express uncertainty about this classification.
  • There is a suggestion that count could be viewed as a summary statistic rather than a numerical variable.
  • Participants discuss whether counts can be represented on the axes of a scatterplot, with some agreeing that it is possible, while others question the appropriateness of such representation.
  • One participant argues that counts are natural numbers with arithmetic properties, while another emphasizes that they should not be classified solely as categorical due to their ordered nature and mathematical operations applicable to them.
  • The transformation of continuous variables into categorical variables through binning is debated, with some stating it creates a new categorical variable rather than changing the original variable's nature.
  • Examples are provided, such as the classification of patient age into ranges, to illustrate the distinction between continuous and categorical variables.

Areas of Agreement / Disagreement

Participants express differing views on the classification of count as a numerical variable, the representation of counts in scatterplots, and the implications of binning continuous variables into categorical ones. The discussion remains unresolved with multiple competing perspectives present.

Contextual Notes

Limitations include the potential ambiguity in definitions of numerical and categorical variables, as well as the varying interpretations of statistical representations. The discussion does not reach a consensus on these points.

fog37
Messages
1,566
Reaction score
108
TL;DR
Count and Categorical Variables...
Hello,
In the context of categorical variables, a frequency table which gives us the count (aka frequency) for each level of the categorical variable. Count is a number telling us how many times a specific level occurs. A bar-chart handles a single categorical variable (nominal or ordinal) with its levels indicated on the x-axis and count (frequency), or relative frequency, on the y-axis.

My question: is count a discrete non-negative numerical variable? If not, what is it? I don't think it is a numerical variable...
Can count ever be represented as one of the axes of a scatterplot? A scatterplot is designed to accommodate two numerical variables (both discrete, both continuous, one discrete and one continuous) on its two axes.

Here another example that confuses me: a dataset where each row represents a different country and there is a variable that reports the percentage of the country population who is religious (which is the count of people who responded YES to being religious divided by the total country population). That column contains % values and seems to represent a numeric variable even if it represents the relative count of a categorical variable. Is that correct? See attached table:

1676427604467.png


Also, cost and profit are generally considered continuous variables but I believe they are discrete numeric variables since money is a multiple of the cent, the smallest increment. When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?

Thank you!
 
Physics news on Phys.org
fog37 said:
My question: is count a discrete non-negative numerical variable?
It certainly could be.

fog37 said:
If not, what is it?
It is a summary statistic.

fog37 said:
Can count ever be represented as one of the axes of a scatterplot?
Sure. Just like you could put medians or standard deviations on one of the axes of a scatter plot you could also put counts on an axis of a scatter plot.

fog37 said:
Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
It could if you wanted to.
 
  • Like
Likes   Reactions: fog37
fog37 said:
My question: is count a discrete non-negative numerical variable? If not, what is it? I don't think it is a numerical variable...
It is a natural number (where 0 is included). They are ordered and have arithmetic properties.
fog37 said:
Can count ever be represented as one of the axes of a scatterplot? A scatterplot is designed to accommodate two numerical variables (both discrete, both continuous, one discrete and one continuous) on its two axes.
I wouldn't call it a "scatter plot", but you certainly can make charts where the categories are along one axis, in order of the count. It happens all the time when you are interested in what category occurs the most. From https://inferentialthinking.com/chapters/07/1/Visualizing_Categorical_Distributions.html:
Visualizing_Categorical_Distributions_21_0.png

fog37 said:
Here another example that confuses me: a dataset where each row represents a different country and there is a variable that reports the percentage of the country population who is religious (which is the count of people who responded YES to being religious divided by the total country population). That column contains % values and seems to represent a numeric variable even if it represents the relative count of a categorical variable. Is that correct? See attached table:

View attachment 322281

Also, cost and profit are generally considered continuous variables but I believe they are discrete numeric variables since money is a multiple of the cent, the smallest increment. When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
No. The counting numbers have too many arithmetic properties to just be considered "categorical". They are ordered. You can add them. You can subtract them, although it might give a negative result. You can divide them if you allow rational numbers. The natural numbers are a subset of the integers, the rational numbers, and the real numbers and should not be considered only "categorical".
 
Last edited:
  • Like
Likes   Reactions: WWGD and fog37
fog37 said:
When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
It doesn't "turn it into" a categorical variable, but it creates a new categorical variable. If the original continuous variable was 'income' we could call the new categorical variable 'income range'.
I occasionally work on health system patient data which, amongst hundreds of variables, has age (of patient at time of recorded health system interaction) - a floating point variable, and 'age-range' a categorical variable that classifies patients into 5-year age ranges. Although the database provides both variable, we could just read in 'age' and derive 'age range' from 'age'.
 
  • Like
Likes   Reactions: Dale, WWGD and fog37

Similar threads

  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • Poll Poll
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 14 ·
Replies
14
Views
4K
  • Poll Poll
  • · Replies 1 ·
Replies
1
Views
3K