Different correlation coefficient results between Matlab and Excel.

Click For Summary

Discussion Overview

The thread discusses discrepancies in correlation coefficient results obtained from MATLAB and Excel when analyzing two sets of data. Participants explore potential reasons for the differences in values, focusing on the functions used in each software and the nature of statistical estimators.

Discussion Character

  • Debate/contested, Technical explanation, Exploratory

Main Points Raised

  • One participant notes a difference in correlation coefficients: 0.801 from Excel and 0.786 from MATLAB, questioning which result is correct.
  • Another participant suggests revealing the full output of the MATLAB function, indicating that MATLAB returns a matrix and may provide additional insights.
  • It is mentioned that the difference in results could stem from subjective aspects of statistics and the nature of estimators, which can vary based on the method used.
  • A participant identifies a potential issue with data precision, stating that Data A in Excel appears as two decimal numbers, while it actually contains four decimal numbers, leading to different calculations.
  • Another participant raises the possibility of rounding errors affecting the results, particularly noting Excel's reputation for such issues.

Areas of Agreement / Disagreement

Participants express differing views on the reasons for the discrepancies, with no consensus reached on which software's result is more accurate or the definitive cause of the differences.

Contextual Notes

Participants highlight the importance of understanding the definitions and functions used in each software, as well as the potential impact of data precision and rounding on statistical calculations.

honeysyd
Messages
3
Reaction score
0
Different correlation coefficient results between Matlab and Excel. Please Help!

I have two sets of dat as below. A problem is that MATLAB and excel returns a different value of correlation coefficient (i.e., 0.801 from Excel and 0.786 from Matlab).

For Matlab, I used the function: corrcoef(DataA, DataB). For excel, I used the function "Correl".

Would you please advise me which one is correct? I found that the excel uses different function from the one used in Matlab. I do not know how to write a function here, so I linked them as below.

http://office.microsoft.com/en-us/excel-help/correl-HP005209023.aspx
http://www.mathworks.com.au/help/techdoc/ref/corrcoef.html

Thank you in advance

Data A
0.56
0.63
0.54
0.59
0.57
0.69

Data B
1.16
1.68
1.24
1.33
1.28
1.52
 
Last edited by a moderator:
Physics news on Phys.org


The MATLAB function returns a matrix. Reveal its other entries to us.

Try using MATLAB on the numerical example given in the Excel documentation.

As to "right" vs "wrong", it is always possible for computer code to a have errors. But the difference may also be due to the fact that statistics has subjective aspects. To see that, you must understand the usual scenario for "estimation"

That scenario is:
You have a probability distributions with some unknown parameter (such as the correlation coefficient in a joint distribution of 2 random variables)

You have data.

You do a computation on the data to estimate one of the unknown parameters. The function you use is called "an estimator". For a given parameter, there can be different estimators. Since the data is random, an estimator is a random variable. Different esimatators can have different aspects that argue in favor of using them. It is a subjective decision, which estimator you want to use. The odds are that any estimator will be "wrong" (i.e. not exactly equal to the parameter you are trying to estimate).

To use proper teminology, you should distinguish between an unknown parameter and its estimator and any "standard" related statistic computed from the sample. For example, there is a "population variance", several different "estimators for the population variance" and there is the "sample variance" (which can be defined two different ways, depending on which textbook you use).
 
Last edited:


Thank you very much Stephen.

It was a great help. Beside your reply, I found another reason.

Data A appears to be two decimal numbers in excel, but actually Data A is four decimal numbers.

When I copy and paste Data A from Excel to Matlab, only two decimal numbers were copied and pasted, while Excel calculated the correlation coefficient with four decimal number of Data A.


I hope this helps others having a same problem.
 


It's hard to say exactly what is happening, but if both Matlab and Excel are calculating the same statistic, the difference may be due to rounding errors. Excel is notorious for things like this.