Different correlation coefficient results between Matlab and Excel.

Click For Summary
SUMMARY

The correlation coefficient results from MATLAB and Excel differ due to the use of different functions and potential data precision issues. MATLAB's corrcoef function returned a coefficient of 0.786, while Excel's CORREL function provided a value of 0.801. The discrepancy can arise from how data is represented; in this case, Data A was copied with reduced precision from Excel to MATLAB, leading to different calculations. Understanding the distinction between estimators and the impact of data precision is crucial in statistical analysis.

PREREQUISITES
  • Familiarity with MATLAB functions, specifically corrcoef
  • Understanding of Excel functions, particularly CORREL
  • Basic knowledge of statistical estimators and parameters
  • Awareness of data precision and its effects on statistical calculations
NEXT STEPS
  • Research the differences between MATLAB's corrcoef and Excel's CORREL functions
  • Learn about statistical estimators and their properties in depth
  • Explore data precision issues in numerical computing environments
  • Investigate rounding errors and their impact on statistical analysis
USEFUL FOR

Data analysts, statisticians, researchers, and anyone working with statistical software who needs to understand discrepancies in correlation coefficient calculations between MATLAB and Excel.

honeysyd
Messages
3
Reaction score
0
Different correlation coefficient results between Matlab and Excel. Please Help!

I have two sets of dat as below. A problem is that MATLAB and excel returns a different value of correlation coefficient (i.e., 0.801 from Excel and 0.786 from Matlab).

For Matlab, I used the function: corrcoef(DataA, DataB). For excel, I used the function "Correl".

Would you please advise me which one is correct? I found that the excel uses different function from the one used in Matlab. I do not know how to write a function here, so I linked them as below.

http://office.microsoft.com/en-us/excel-help/correl-HP005209023.aspx
http://www.mathworks.com.au/help/techdoc/ref/corrcoef.html

Thank you in advance

Data A
0.56
0.63
0.54
0.59
0.57
0.69

Data B
1.16
1.68
1.24
1.33
1.28
1.52
 
Last edited by a moderator:
Physics news on Phys.org


The MATLAB function returns a matrix. Reveal its other entries to us.

Try using MATLAB on the numerical example given in the Excel documentation.

As to "right" vs "wrong", it is always possible for computer code to a have errors. But the difference may also be due to the fact that statistics has subjective aspects. To see that, you must understand the usual scenario for "estimation"

That scenario is:
You have a probability distributions with some unknown parameter (such as the correlation coefficient in a joint distribution of 2 random variables)

You have data.

You do a computation on the data to estimate one of the unknown parameters. The function you use is called "an estimator". For a given parameter, there can be different estimators. Since the data is random, an estimator is a random variable. Different esimatators can have different aspects that argue in favor of using them. It is a subjective decision, which estimator you want to use. The odds are that any estimator will be "wrong" (i.e. not exactly equal to the parameter you are trying to estimate).

To use proper teminology, you should distinguish between an unknown parameter and its estimator and any "standard" related statistic computed from the sample. For example, there is a "population variance", several different "estimators for the population variance" and there is the "sample variance" (which can be defined two different ways, depending on which textbook you use).
 
Last edited:


Thank you very much Stephen.

It was a great help. Beside your reply, I found another reason.

Data A appears to be two decimal numbers in excel, but actually Data A is four decimal numbers.

When I copy and paste Data A from Excel to Matlab, only two decimal numbers were copied and pasted, while Excel calculated the correlation coefficient with four decimal number of Data A.


I hope this helps others having a same problem.
 


It's hard to say exactly what is happening, but if both Matlab and Excel are calculating the same statistic, the difference may be due to rounding errors. Excel is notorious for things like this.