Validating Linear Regression Trendlines: Understanding R-Squared Values

  • Thread starter jantunes
  • Start date
  • Tags
    Value
In summary, the R-squared value for the new data is strange and doesn't seem to match the trendline (which was based on the 200 data points).
  • #1
jantunes
2
0
wrong R-Squared value??

Hi all,

Warning: this is my very first post :)

I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).

This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'

Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)

The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).

Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

Thanks in advance!
 
Physics news on Phys.org
  • #2
Are you re-estimating y' with the new data? If not, how are you calculating the R2 with the new data?
 
  • #3
No, it's the same y' (estimated from the first 200 data points). From your question I suspect I cannot calculate R-squared from a different sample that the one used for y'.

What I really want is to get a statistical measure of the prediction accuracy (maybe R-squared?) of y' for the new data (which is actually all the data that y' is supposed to predict). Which is the graphical counterpart of plotting the new data and its prediction (y'), and see how good they match.
 
  • #4
Last edited:

Related to Validating Linear Regression Trendlines: Understanding R-Squared Values

1. What does a low R-squared value indicate?

A low R-squared value indicates that the model does not fit the data well and that the variation in the data is not explained by the model.

2. Can a high R-squared value be misleading?

Yes, a high R-squared value can be misleading if it is not accompanied by a thorough analysis of the model and its assumptions. It is important to also consider other metrics and conduct hypothesis testing to fully evaluate the model's performance.

3. How do outliers affect the R-squared value?

Outliers can greatly influence the R-squared value, especially in linear regression models. They can inflate the R-squared value and make the model appear to fit the data better than it actually does. It is important to identify and address outliers before interpreting the R-squared value.

4. Is a high R-squared value always desirable?

Not necessarily. A high R-squared value does not necessarily mean that the model is a good fit for the data. It is possible to have a high R-squared value for a model that is overfitting the data or violating its assumptions. It is important to consider other metrics and conduct thorough model evaluation before relying solely on the R-squared value.

5. Can the R-squared value be negative?

No, the R-squared value cannot be negative as it is a measure of the proportion of variation in the dependent variable that is explained by the independent variables. However, it is possible to have a negative adjusted R-squared value in cases where the model is performing worse than a baseline model or not fitting the data well.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
565
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
2
Replies
64
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Back
Top