What Is PRESS & Why is e_{i,-i} Its Notation?

  • Context: Undergrad 
  • Thread starter Thread starter logarithmic
  • Start date Start date
  • Tags Tags
    Notation
Click For Summary
SUMMARY

PRESS stands for "Prediction Sum of Squares" and is denoted as e_{i,-i} to indicate the residuals calculated by removing the i-th observation from the dataset. This notation highlights the influence of individual data points on the overall regression model. The PRESS residuals provide a method to assess how much each observation affects the regression fit, particularly in the presence of outliers. Understanding PRESS residuals is crucial for regression diagnostics, allowing analysts to identify influential data points without the need for multiple refits of the regression model.

PREREQUISITES
  • Linear regression analysis
  • Understanding of residuals and their significance
  • Familiarity with regression diagnostics
  • Basic knowledge of statistical influence measures
NEXT STEPS
  • Research "PRESS residuals in regression analysis"
  • Learn about "internally versus externally standardized residuals"
  • Explore "regression diagnostics techniques"
  • Study "influence measures in linear regression"
USEFUL FOR

Statisticians, data analysts, and researchers involved in regression modeling and diagnostics will benefit from this discussion, particularly those interested in understanding the impact of outliers on regression results.

logarithmic
Messages
103
Reaction score
0
So after studying PRESS residuals I'm curious to know what PRESS stands for, and why it is denoted [tex]e_{i,-i}[/tex]. What is the significance of this particular subscript in the notation. (Not very mathematical questions, I know).
 
Physics news on Phys.org
The results of linear regression - estimates of the coefficients as well as anything else - are easily influenced by outliers Even a single outlier, in either [tex]y[/tex] or [tex]\mathbf{x}[/tex] space, can have a drastic influence on the fit.
The residuals you are discussing provide one way of providing just how much influence the individual observations have on the overall regression. The idea is to think about removing, one at a time, individual data points from your data set, fitting the model without that data value, then seeing how well this new regression describes the eliminated value.

I'll concentrate on the data value labeled [tex](\mathbf{x}_1, y_1)[/tex] - except for notation, the idea is the same for all. The philosophy is

  • Eliminate [tex](\mathbf{x}_1, y_1)[/tex] from the data
  • Fit the regression using the remaining data
  • Use the new model to estimate [tex]y_1[/tex]

The PRESS residual is simply the difference between the estimate of [tex]y_1[/tex], obtained with the reduced data set, and the actual value of [tex]y_1[/tex]. Large values of this residual indicate that the pair [tex](\mathbf{x}_1, y_1)[/tex] have a large contribution to the fitting of the original regression.

The same idea holds for the other data values.
It is not necessary to actually refit the regression several times, once for each of the original data values. There are rather simple ways to obtain these values from items calculated during the original fit.

As you read more on this topic you will also see discussions of internally versus externally standardized residuals. The terminology is extensive, but all of these ideas relate to the same goal: examining a large, complicated, set of data to see which points exert unreasonable influence on a regression. These ideas, and others, fall into the category of regression diagnostics .

Finally, one short discussion of the ideas in your post can be found here:

http://www.sph.umich.edu/class/bio650/2001/LN_Nov05.pdf

Good luck.
 
Last edited by a moderator:

Similar threads

  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 13 ·
Replies
13
Views
3K
  • · Replies 124 ·
5
Replies
124
Views
10K
  • · Replies 7 ·
Replies
7
Views
1K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 36 ·
2
Replies
36
Views
7K
Replies
1
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 10 ·
Replies
10
Views
2K