Count and Categorical Variables...

In summary, the conversation discusses the concept of a frequency table for categorical variables, where the count or frequency of each level is shown. It also touches on the use of bar charts for single categorical variables, and whether count can be represented on a scatterplot. The conversation then moves on to the difference between count being a discrete non-negative numerical variable and a summary statistic, and whether binning a continuous variable into intervals turns it into an ordinal categorical variable. The conversation ends with an example of working with patient data and how age can be represented as both a floating point and categorical variable.
  • #1
fog37
1,568
108
TL;DR Summary
Count and Categorical Variables...
Hello,
In the context of categorical variables, a frequency table which gives us the count (aka frequency) for each level of the categorical variable. Count is a number telling us how many times a specific level occurs. A bar-chart handles a single categorical variable (nominal or ordinal) with its levels indicated on the x-axis and count (frequency), or relative frequency, on the y-axis.

My question: is count a discrete non-negative numerical variable? If not, what is it? I don't think it is a numerical variable...
Can count ever be represented as one of the axes of a scatterplot? A scatterplot is designed to accommodate two numerical variables (both discrete, both continuous, one discrete and one continuous) on its two axes.

Here another example that confuses me: a dataset where each row represents a different country and there is a variable that reports the percentage of the country population who is religious (which is the count of people who responded YES to being religious divided by the total country population). That column contains % values and seems to represent a numeric variable even if it represents the relative count of a categorical variable. Is that correct? See attached table:

1676427604467.png


Also, cost and profit are generally considered continuous variables but I believe they are discrete numeric variables since money is a multiple of the cent, the smallest increment. When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?

Thank you!
 
Physics news on Phys.org
  • #2
fog37 said:
My question: is count a discrete non-negative numerical variable?
It certainly could be.

fog37 said:
If not, what is it?
It is a summary statistic.

fog37 said:
Can count ever be represented as one of the axes of a scatterplot?
Sure. Just like you could put medians or standard deviations on one of the axes of a scatter plot you could also put counts on an axis of a scatter plot.

fog37 said:
Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
It could if you wanted to.
 
  • Like
Likes fog37
  • #3
fog37 said:
My question: is count a discrete non-negative numerical variable? If not, what is it? I don't think it is a numerical variable...
It is a natural number (where 0 is included). They are ordered and have arithmetic properties.
fog37 said:
Can count ever be represented as one of the axes of a scatterplot? A scatterplot is designed to accommodate two numerical variables (both discrete, both continuous, one discrete and one continuous) on its two axes.
I wouldn't call it a "scatter plot", but you certainly can make charts where the categories are along one axis, in order of the count. It happens all the time when you are interested in what category occurs the most. From https://inferentialthinking.com/chapters/07/1/Visualizing_Categorical_Distributions.html:
Visualizing_Categorical_Distributions_21_0.png

fog37 said:
Here another example that confuses me: a dataset where each row represents a different country and there is a variable that reports the percentage of the country population who is religious (which is the count of people who responded YES to being religious divided by the total country population). That column contains % values and seems to represent a numeric variable even if it represents the relative count of a categorical variable. Is that correct? See attached table:

View attachment 322281

Also, cost and profit are generally considered continuous variables but I believe they are discrete numeric variables since money is a multiple of the cent, the smallest increment. When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
No. The counting numbers have too many arithmetic properties to just be considered "categorical". They are ordered. You can add them. You can subtract them, although it might give a negative result. You can divide them if you allow rational numbers. The natural numbers are a subset of the integers, the rational numbers, and the real numbers and should not be considered only "categorical".
 
Last edited:
  • Like
Likes WWGD and fog37
  • #4
fog37 said:
When we build a histogram, we bin the continuous variable values into intervals. Does that turn the continuous variable into an ordinal categorical variable since the data now belongs in a finite number of groups?
It doesn't "turn it into" a categorical variable, but it creates a new categorical variable. If the original continuous variable was 'income' we could call the new categorical variable 'income range'.
I occasionally work on health system patient data which, amongst hundreds of variables, has age (of patient at time of recorded health system interaction) - a floating point variable, and 'age-range' a categorical variable that classifies patients into 5-year age ranges. Although the database provides both variable, we could just read in 'age' and derive 'age range' from 'age'.
 
  • Like
Likes Dale, WWGD and fog37

1. What is the difference between count and categorical variables?

Count variables are numerical data that represent a count or quantity, such as the number of people in a household or the number of cars in a parking lot. Categorical variables, on the other hand, represent categories or groups, such as gender or type of car. While count variables can take on any numerical value, categorical variables are limited to a specific set of categories.

2. How do you identify count and categorical variables in a dataset?

To identify count variables, look for numerical data that represent a count or quantity. Categorical variables can be identified by looking for data that falls into distinct categories or groups. It's also important to check the data type of each variable, as count variables will typically be represented as integers while categorical variables will be represented as strings or factors.

3. What are some common statistical tests used for analyzing count and categorical variables?

For count variables, common statistical tests include the mean, median, and mode, as well as measures of variability such as standard deviation and variance. For categorical variables, common tests include chi-square tests, t-tests, and ANOVA tests.

4. Can count and categorical variables be used in regression analysis?

Yes, both count and categorical variables can be used in regression analysis. However, they may need to be transformed or recoded in order to meet the assumptions of the regression model. For example, count variables may need to be transformed using a logarithmic function, while categorical variables may need to be recoded as dummy variables.

5. How can I visualize the relationship between count and categorical variables?

One way to visualize the relationship between count and categorical variables is by creating a bar chart or histogram. This can help show the distribution of the count variable within each category. Another option is to use a scatter plot with the count variable on the y-axis and the categorical variable on the x-axis. This can help show any patterns or trends in the data.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • General Math
Replies
1
Views
1K
  • Poll
  • Science and Math Textbooks
Replies
1
Views
2K
  • STEM Academic Advising
Replies
13
Views
2K
  • Advanced Physics Homework Help
Replies
6
Views
1K
  • Introductory Physics Homework Help
Replies
14
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
5K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
5K
Back
Top