- #1
jantunes
- 2
- 0
wrong R-Squared value??
Hi all,
Warning: this is my very first post :)
I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).
This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'
Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)
The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).
Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?
Thanks in advance!
Hi all,
Warning: this is my very first post :)
I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).
This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'
Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)
The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).
Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?
Thanks in advance!