What Is PRESS & Why is e_{i,-i} Its Notation?

logarithmic · Sep 6, 2008

So after studying PRESS residuals I'm curious to know what PRESS stands for, and why it is denoted e_{i,-i}. What is the significance of this particular subscript in the notation. (Not very mathematical questions, I know).

statdad · Sep 7, 2008

The results of linear regression - estimates of the coefficients as well as anything else - are easily influenced by outliers Even a single outlier, in either y or \mathbf{x} space, can have a drastic influence on the fit.
The residuals you are discussing provide one way of providing just how much influence the individual observations have on the overall regression. The idea is to think about removing, one at a time, individual data points from your data set, fitting the model without that data value, then seeing how well this new regression describes the eliminated value.

I'll concentrate on the data value labeled (\mathbf{x}_1, y_1) - except for notation, the idea is the same for all. The philosophy is

Eliminate (\mathbf{x}_1, y_1) from the data
Fit the regression using the remaining data
Use the new model to estimate y_1

The PRESS residual is simply the difference between the estimate of y_1, obtained with the reduced data set, and the actual value of y_1. Large values of this residual indicate that the pair (\mathbf{x}_1, y_1) have a large contribution to the fitting of the original regression.

The same idea holds for the other data values.
It is not necessary to actually refit the regression several times, once for each of the original data values. There are rather simple ways to obtain these values from items calculated during the original fit.

As you read more on this topic you will also see discussions of internally versus externally standardized residuals. The terminology is extensive, but all of these ideas relate to the same goal: examining a large, complicated, set of data to see which points exert unreasonable influence on a regression. These ideas, and others, fall into the category of regression diagnostics .

Finally, one short discussion of the ideas in your post can be found here:

http://www.sph.umich.edu/class/bio650/2001/LN_Nov05.pdf

Good luck.

What Is PRESS & Why is e_{i,-i} Its Notation?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

I Stochastic calculus: Ito's lemma and differentials

I The reason for lambda calculus being universal

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective