Different results for factor vs continuous

• A
• FallenApple
In summary, the author found that when age is factorized into groups, there is barely any interaction effect. However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)
FallenApple
So, I'm doing an interaction model with response vs treatment_type interaction with age+controls(for confounders) with age being continous, say patients ranging from 20 years old to 90 years old.

so I have two models.
y=age+treatment_type . + . age*treatment_type

y=factor(age)+treatment_type . + . factor(age)*treatment_type

Basically what I got was that age is highly significant for the continuous model. For slight deviations in age, there is a huge effect on of treatment.

However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)Why? What could be the reason for this. Is the reason why it's so significant for the continuous model is that patients simply differ from each other so much, that it seems like age has an effect, but it really doesn't.

Afterall, it has no effect when looking at groups. But somehow, within the groups, slight deviations give a large effect.

Just to clarify, how are your age factor levels defined?

FallenApple said:
For slight deviations in age, there is a huge effect on of treatment
That seems suspicious on its own. If an effect is large then you should clearly see it when you plot the data even without doing statistics. Is that the case?

How do your regression diagnostic plots look? Do you have some high leverage or otherwise suspicious points?

FallenApple said:
However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)
How many groups have you factored age into? If you have factored it into many groups then you will have a model with a large number of degrees of freedom. A good statistics package will take that into account and reduce the significance correspondingly.

FactChecker said:
Just to clarify, how are your age factor levels defined?
I've split it up into 4 sections. So basically 20-40, 40-60 etc.

Dale said:
That seems suspicious on its own. If an effect is large then you should clearly see it when you plot the data even without doing statistics. Is that the case?

How do your regression diagnostic plots look? Do you have some high leverage or otherwise suspicious points?
Many. But I can't reject those because they occur due to some systematic process. I've accounted for that by using a negative binomial link.
How many groups have you factored age into? If you have factored it into many groups then you will have a model with a large number of degrees of freedom. A good statistics package will take that into account and reduce the significance correspondingly.

Just 4. But I've factored it again into many. And here's the plot.

It seems that they are maybe canceling.

Looking at your data, it looks like there is only one (solid line treatment, age (18.9, 27.4]) combination that is significantly different from the others. (Are the different lines different treatments?) Is there much data in that combination category or could it be a small-sample outlier?

I recommend that you statistically analyse the one glaring (solid line treatment, age (18.9, 27.4]) combination as one step and then look at the others in a separate statistical analysis.

Last edited:
FallenApple said:
And here's the plot
That doesn't look like it should be non significant. How does the data itself look. Can you see the interaction in the raw data?

Dale said:
That doesn't look like it should be non significant. How does the data itself look. Can you see the interaction in the raw data?
When I increase the number of partitions in the factor, it seems that there it follows the same trend(just a bunch of zigzags with the solid one being the most prominant). I think that is why for continuous age, it's highly significant, because even one slight increment in the age could send it in a certain direction.

FactChecker said:
Looking at your data, it looks like there is only one (solid line treatment, age (18.9, 27.4]) combination that is significantly different from the others. (Are the different lines different treatments?) Is there much data in that combination category or could it be a small-sample outlier?

I recommend that you statistically analyse the one glaring (solid line treatment, age (18.9, 27.4]) combination as one step and then look at the others in a separate statistical analysis.
So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway

FallenApple said:
So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway
From the looks of the data, I think that it would be very misleading to allow the extreme result from one combination of (treatment, age) to influence your conclusions about the other combinations. In fact, the other treatments show, if anything, a slight bit of the opposite trend. If you do not address that combination separately, I don't think your conclusions will have any merit.

FallenApple said:
So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway
Then it doesn't sound like you will have enough data points to justify a large number of degrees of freedom. That is probably driving the lack of significance somewhat.

Also, do you see this interaction when you plot the data itself (not the fit)? I think I have asked this three times now.

Dale said:
Then it doesn't sound like you will have enough data points to justify a large number of degrees of freedom. That is probably driving the lack of significance somewhat.

Also, do you see this interaction when you plot the data itself (not the fit)? I think I have asked this three times now.

I see. That makes sense. So generally, would I need to have the samples balanced?

I thought of one thing, so it might not work because there is such a low number within that combination. Like in the tens compared to over a thousand total.

But if age is continuous, then there is no combination sample of data. Its just the whole data set. Is this the correct way to see it?

I'm not sure what you mean. The pattern that I plotted was based on the data. It wasn't derived from a regression.

1. What is the difference between a factor and continuous variable in a scientific study?

A factor variable is a categorical variable that represents distinct groups or categories, while a continuous variable is a numerical variable that can take on any value within a range. In a scientific study, this could mean the difference between measuring something like height (a continuous variable) versus gender (a factor variable).

2. Why is it important to distinguish between factor and continuous variables in a study?

Distinguishing between factor and continuous variables is important because they require different statistical analyses and can lead to different conclusions. For example, using a t-test for a continuous variable when a factor variable is present could result in incorrect conclusions about the relationship between the variables.

3. Can a variable be both a factor and continuous?

Yes, a variable can be both a factor and continuous depending on the context of the study. For example, age could be considered a continuous variable, but it can also be categorized into different groups (e.g. young, middle-aged, elderly) to make it a factor variable.

4. How do you determine if a variable should be treated as a factor or continuous?

The type of variable should be determined based on the research question and the nature of the data. If the variable represents distinct categories, it should be treated as a factor. If the variable represents a measurement or quantity, it should be treated as a continuous variable.

5. What are some common statistical tests used for factor and continuous variables?

For factor variables, common statistical tests include chi-square tests and ANOVA (analysis of variance). For continuous variables, common tests include t-tests and correlation analyses. It is important to choose the appropriate test based on the type of variable being analyzed.

• Set Theory, Logic, Probability, Statistics
Replies
11
Views
6K
• Set Theory, Logic, Probability, Statistics
Replies
1
Views
6K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
474
• Set Theory, Logic, Probability, Statistics
Replies
25
Views
2K
• Biology and Medical
Replies
12
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• General Discussion
Replies
19
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
• Beyond the Standard Models
Replies
1
Views
818