# The effect of cross validation on correlation coefficient

1. Oct 23, 2012

### lyuriedin

I have two variables where the regression line is just the mean as a constant. As such, the correlation is zero. However, when I perform k-fold cross validation (in Weka) this becomes non-zero.

I have no idea why this is. The regression line for whatever the test set is will always be a constant, where the correlation will be zero. Because some of the data will be taken out to act as the validation set at each fold the mean will be different at each fold, but the correlation will still be the same no matter what. The only thing I can think of is that it is computing the correlation between training means with respect to the actual mean, but even then these should sum to zero.

Can anybody clear this up for me?