MATLAB Different correlation coefficient results between Matlab and Excel.

AI Thread Summary
The discussion centers on discrepancies in correlation coefficient results between Matlab and Excel, with Matlab yielding 0.786 and Excel 0.801 for the same data sets. The user employed Matlab's corrcoef function and Excel's Correl function, highlighting that the two programs may use different methodologies for calculating correlation. It was noted that statistical estimators can vary, leading to different results based on the functions used. A key factor identified was that Data A was copied from Excel to Matlab with only two decimal places, while Excel calculated the correlation using four decimal places, potentially causing the difference. Rounding errors in Excel were also mentioned as a possible reason for the discrepancies. The conversation emphasizes the importance of understanding how different software handles data precision and statistical calculations.
honeysyd
Messages
3
Reaction score
0
Different correlation coefficient results between Matlab and Excel. Please Help!

I have two sets of dat as below. A problem is that MATLAB and excel returns a different value of correlation coefficient (i.e., 0.801 from Excel and 0.786 from Matlab).

For Matlab, I used the function: corrcoef(DataA, DataB). For excel, I used the function "Correl".

Would you please advise me which one is correct? I found that the excel uses different function from the one used in Matlab. I do not know how to write a function here, so I linked them as below.

http://office.microsoft.com/en-us/excel-help/correl-HP005209023.aspx
http://www.mathworks.com.au/help/techdoc/ref/corrcoef.html

Thank you in advance

Data A
0.56
0.63
0.54
0.59
0.57
0.69

Data B
1.16
1.68
1.24
1.33
1.28
1.52
 
Last edited by a moderator:
Physics news on Phys.org


The MATLAB function returns a matrix. Reveal its other entries to us.

Try using MATLAB on the numerical example given in the Excel documentation.

As to "right" vs "wrong", it is always possible for computer code to a have errors. But the difference may also be due to the fact that statistics has subjective aspects. To see that, you must understand the usual scenario for "estimation"

That scenario is:
You have a probability distributions with some unknown parameter (such as the correlation coefficient in a joint distribution of 2 random variables)

You have data.

You do a computation on the data to estimate one of the unknown parameters. The function you use is called "an estimator". For a given parameter, there can be different estimators. Since the data is random, an estimator is a random variable. Different esimatators can have different aspects that argue in favor of using them. It is a subjective decision, which estimator you want to use. The odds are that any estimator will be "wrong" (i.e. not exactly equal to the parameter you are trying to estimate).

To use proper teminology, you should distinguish between an unknown parameter and its estimator and any "standard" related statistic computed from the sample. For example, there is a "population variance", several different "estimators for the population variance" and there is the "sample variance" (which can be defined two different ways, depending on which textbook you use).
 
Last edited:


Thank you very much Stephen.

It was a great help. Beside your reply, I found another reason.

Data A appears to be two decimal numbers in excel, but actually Data A is four decimal numbers.

When I copy and paste Data A from Excel to Matlab, only two decimal numbers were copied and pasted, while Excel calculated the correlation coefficient with four decimal number of Data A.


I hope this helps others having a same problem.
 


It's hard to say exactly what is happening, but if both Matlab and Excel are calculating the same statistic, the difference may be due to rounding errors. Excel is notorious for things like this.
 
Back
Top