Validating Linear Regression Trendlines: Understanding R-Squared Values

  • Context: Undergrad 
  • Thread starter Thread starter jantunes
  • Start date Start date
  • Tags Tags
    Value
Click For Summary

Discussion Overview

The discussion revolves around the validation of linear regression trendlines, specifically focusing on the interpretation and calculation of R-squared values when comparing predicted data against new data points. The scope includes statistical measures of prediction accuracy and methods for validating regression models.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant describes their process of performing linear regression on a dataset and obtaining an R-squared value of 0.96, but encounters strange R-squared values when validating the trendline with new data.
  • Another participant questions whether the R-squared is being recalculated with the new data and suggests that it may not be valid to calculate R-squared from a different sample than the one used for the initial regression.
  • A participant expresses the desire to obtain a statistical measure of prediction accuracy for the trendline against new data, emphasizing the need for a graphical representation of how well the predictions match the actual data.
  • One suggestion is made to use forecast intervals around predicted points to assess whether actual data points fall within those intervals, referencing a prediction interval concept.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the validity of calculating R-squared values from different datasets. There are competing views on how to assess the accuracy of predictions made by the regression model.

Contextual Notes

The discussion highlights potential limitations in the assumptions made about the data samples used for calculating R-squared, as well as the implications of using the same regression model for different datasets.

jantunes
Messages
2
Reaction score
0
wrong R-Squared value??

Hi all,

Warning: this is my very first post :)

I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).

This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'

Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)

The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).

Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

Thanks in advance!
 
Physics news on Phys.org
Are you re-estimating y' with the new data? If not, how are you calculating the R2 with the new data?
 
No, it's the same y' (estimated from the first 200 data points). From your question I suspect I cannot calculate R-squared from a different sample that the one used for y'.

What I really want is to get a statistical measure of the prediction accuracy (maybe R-squared?) of y' for the new data (which is actually all the data that y' is supposed to predict). Which is the graphical counterpart of plotting the new data and its prediction (y'), and see how good they match.
 
Last edited:

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 23 ·
Replies
23
Views
4K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 30 ·
2
Replies
30
Views
5K
  • · Replies 8 ·
Replies
8
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 64 ·
3
Replies
64
Views
6K