Validating Linear Regression Trendlines: Understanding R-Squared Values

jantunes · Nov 14, 2007

wrong R-Squared value??

Hi all,

Warning: this is my very first post :)

I'm doing a linear regression to produce a trendline that can predict (more or less) some future data. The data is very correlated (something like R=0.98).

This is what I do:
1) get 200 data points (x is a time series; y is CPU usage)
2) do linear regression based on those 200 points, resulting in some y'=a + bx
3) get R-squared (R^2=0.96) for the y'

Then, I want to validate that trendline/prediction by comparing it with more real data:
4) get more data points, past the 200 points (eg 10000)
5) get R-squared for the y' (this time against the new data)

The problem is that this new R-squared has very strange values (depending on the equation), either <0 (SSE/SST>1), >1 (SSR>SST), or near 0,99 (when in fact the trendline is not accurate).
Has I said I have already tried different ways of calculating the R-squared. They all give the same value in 3), but strange values in 5).

Am I doing some wrong assumption here? I pretty sure the calculations are correct... How can I validate my trendlines (linear regression models)?

Thanks in advance!

EnumaElish · Nov 14, 2007

Are you re-estimating y' with the new data? If not, how are you calculating the R² with the new data?

jantunes · Nov 14, 2007

No, it's the same y' (estimated from the first 200 data points). From your question I suspect I cannot calculate R-squared from a different sample that the one used for y'.

What I really want is to get a statistical measure of the prediction accuracy (maybe R-squared?) of y' for the new data (which is actually all the data that y' is supposed to predict). Which is the graphical counterpart of plotting the new data and its prediction (y'), and see how good they match.

EnumaElish · Nov 16, 2007

You can wrap a forecast interval around each forecast (predicted) point and see whether the actual data point lies within the forecast interval. See http://en.wikipedia.org/wiki/Prediction_interval

Validating Linear Regression Trendlines: Understanding R-Squared Values

Thread 'Formal derivation of statement from Peano Arithmetic system'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Help me understand skewness in QQ-plots please

I What Are the Axioms of Fuzzy Logic and How Do They Extend Boolean Algebra?

A Distribution of Range of Samples taken from N(0,1)

Recent Insights

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers