A Keeping Randomized Variable in Regression?

FallenApple · May 29, 2017

So if I have a study where people are randomized within treatment sites do I always have to have the site indicator in the regression equation?

This text(Homsler and Lemmeshow 2nd ed) says yes.

Here is some context provided below.

And here is the output for the multivariate equation after univariate analysis has indicated that how to build the multivariate model

Now the usual strategy is simply to drop variables that are not statistically significant.

They said that because SITE is randomized, we cannot drop it. Why? It wasn't explained in the text.
Is it because it is a potential confounder? But clearly from the pval, we don't see it associated with the outcome. So it doesn't seem to confound anything.

What is different about site in this study than about a study simply randomizing something into two treatment groups where there is only one site to begin with ?

What is different about site than a regular indicator variable describing dichotomy? I know it has something to do with the randomization within site, but I don't get it at a gut level

andrewkirk · May 29, 2017

FallenApple said:

What is different about site in this study than about a study simply randomizing something into two treatment groups where there is only one site to begin with ?

In this experiment the choice of which Site to visit for treatment is made by the patient, so there may be self-selection going on. For instance it may be that patients who are more likely to relapse are more attracted to Site A rather than Site B, or that patients for whom the difference in effectiveness between short and long treatments are more attracted to one site than the other.

In contrast, if the patients had applied for treatment to some central treatment authority, and were randomly allocated by that authority to one of the two sites there would be no self-selection.

I think what H&L are implying is that the potential for self-selection biasing the results between sites is significant enough that one should keep SITE in the model regardless of its p-value.

FactChecker · May 29, 2017

I would have thought that SITE self-selection would cause an artificially high statistical significance, not lessen it. My initial reaction is to leave those variables out of the model.

A Keeping Randomized Variable in Regression?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Roulette wheel physics and probability'

Thread 'Detail of Diagonalization Lemma'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective