Keeping Randomized Variable in Regression?

In summary, the text by Homsler and Lemmeshow states that in a study where people are randomized within treatment sites, the site indicator should always be included in the regression equation. This is because the potential for self-selection biasing the results between sites is significant enough to warrant its inclusion, regardless of its p-value. This is different from a study where there is only one site and patients are randomly allocated by a central authority, as there is no self-selection bias in that case. Leaving out the site indicator in this type of study may result in artificially high statistical significance.
  • #1
FallenApple
566
61
So if I have a study where people are randomized within treatment sites do I always have to have the site indicator in the regression equation?

This text(Homsler and Lemmeshow 2nd ed) says yes.

A_REASON_SITE.png

Here is some context provided below.
A_Description_of_Question.png


And here is the output for the multivariate equation after univariate analysis has indicated that how to build the multivariate model

A_Output_Drug_trt.png


Now the usual strategy is simply to drop variables that are not statistically significant.

They said that because SITE is randomized, we cannot drop it. Why? It wasn't explained in the text.
Is it because it is a potential confounder? But clearly from the pval, we don't see it associated with the outcome. So it doesn't seem to confound anything.

What is different about site in this study than about a study simply randomizing something into two treatment groups where there is only one site to begin with ?

What is different about site than a regular indicator variable describing dichotomy? I know it has something to do with the randomization within site, but I don't get it at a gut level
 
Last edited:
Physics news on Phys.org
  • #2
FallenApple said:
What is different about site in this study than about a study simply randomizing something into two treatment groups where there is only one site to begin with ?
In this experiment the choice of which Site to visit for treatment is made by the patient, so there may be self-selection going on. For instance it may be that patients who are more likely to relapse are more attracted to Site A rather than Site B, or that patients for whom the difference in effectiveness between short and long treatments are more attracted to one site than the other.

In contrast, if the patients had applied for treatment to some central treatment authority, and were randomly allocated by that authority to one of the two sites there would be no self-selection.

I think what H&L are implying is that the potential for self-selection biasing the results between sites is significant enough that one should keep SITE in the model regardless of its p-value.
 
  • #3
I would have thought that SITE self-selection would cause an artificially high statistical significance, not lessen it. My initial reaction is to leave those variables out of the model.
 

1. What is a randomized variable in regression?

A randomized variable in regression is a variable that is selected at random from a population. This is done to reduce bias and increase the generalizability of the results. Randomized variables are often used in experimental designs where participants are randomly assigned to different groups.

2. Why is it important to keep randomized variables in regression?

Keeping randomized variables in regression helps to ensure that the results are not influenced by any potential biases or confounding factors. This allows for more accurate and reliable results that can be generalized to the larger population.

3. How do you control for randomized variables in regression?

One way to control for randomized variables in regression is through random assignment, where participants are randomly assigned to different groups. Another method is through stratified sampling, where participants are selected from different subgroups to ensure representation of the entire population.

4. Can you have too many randomized variables in regression?

Yes, having too many randomized variables in regression can lead to overfitting the model and can make it difficult to interpret the results. It is important to carefully select and control for only the most relevant and important randomized variables in regression analysis.

5. What are some common pitfalls to avoid when working with randomized variables in regression?

Some common pitfalls to avoid when working with randomized variables in regression include not properly controlling for all relevant variables, not having a large enough sample size, and not using appropriate statistical methods to analyze the data. It is important to carefully plan and design the study to minimize potential biases and ensure accurate results.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
459
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
503
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
11
Views
6K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Quantum Interpretations and Foundations
2
Replies
45
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
3K
Back
Top