I Multicollinearity and Interactions

fog37 · Jan 30, 2023

Hello,

I understand the concept of multicollinearity: when dealing with a multiple regression model with two or more independent variables, some of the independent variables may be pairwise correlated. This does not affect the model in terms of its predictive results but it impacts the regression coefficients and how we interpret the various variables (IVs).

Multiplicative interaction terms can also be included in a linear regression model. Multicollinearity and interactions are disjoint in the sense that a model with interaction terms does not need to have multicollinearity and vice versa (interesting things probably happen when the interaction terms are multicollinear).

That said, in the case of multicollinearity, one independent variable ##X_1## affects the dependent variable ##Y## but another independent variable ##X_2## affect (is correlated) with the first independent variable ##X_1##. Isn't that similar to what interaction does? Interaction means that when one IV changes the dependent variable but there is another IV that changes the first IV...

Thank you!

mathman · Jan 30, 2023

Terminology confusing: Independence implies no correlation.

FactChecker · Jan 30, 2023

Statistical analysis would usually treat the two situations the same way.

FactChecker · Jan 30, 2023

mathman said:

Terminology confusing: Independence implies no correlation.

In a multiple regression model, ##Y = a_0 +a_1 X_1 + a_2 X_2 + ... + a_n X_n + \epsilon##, the ##X_i##s are called the "independent variables" regardless of whether they are correlated. ##Y## is the dependent variable.

mathman · Jan 31, 2023

Independence (in probability theory) means no connection and implies no correlation. Your question seems to be about terminology - I am not familiar with the definitions of these terms as you are using them.

Stephen Tashi · Feb 1, 2023

According to sources on the web ( including https://aarongullickson.github.io/stat_book/interaction-terms.html )

An interaction term is a variable that is constructed from two other variables by multiplying those two variables together.

With that definition, an interaction term results from a decision about the form of the function that is being fitted to data. This decision need not be based on a correlation between variables. By contrast the correlation (or lack of it) between variables in a model is usually inferred from the data, but does not necessarily cause us to introduce interaction terms (and convert a linear regression model to a nonlinear model).

An interesting (and presumably well studied) question: How should correlations between variables influence our decision about whether to introduce interaction terms in the function we are fitting to the data?

I Multicollinearity and Interactions

Thread 'A variant of the Monty Hall problem'

Similar threads

B A Little Probability Puzzle

I A variant of the Monty Hall problem

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

I Please Explain (actually explain) The Monty Hall Problem

B How Rare Is Low Smartphone Usage Among Metro Travelers in Japan?

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers