1. Aug 7, 2012

### pamparana

Hello everyone,

I have a question (perhaps a very noob one as well!) regarding correlation between variables where the number of observations are different between the two sets.

So, I have some 32 responses generated from a survey which aim to measure certain variable and I want to correlate them against some company performance indicators. Now these performance indicators are available per month basis for like 63 months.

Now, I have 32 instances of a variable against 63 instances of another. Is it possible to do a simple correlation within these sets where the number of instances are different...

Thanks,
Luca

2. Aug 7, 2012

### Number Nine

What, precisely, would you be correlating, exactly? What is the relationship between the two variables that would allow a correlation to make sense? How do you know which variables in one measure are associated with which variables in another?

Correlation is, fundamentally, normalized covariance, which is a characteristic of joint distributions of random variables. If you have two uneven sets of completely unrelated and unpaired data, then the notion of correlation makes no sense.

3. Aug 8, 2012

### haruspex

Are the 32 responses spread across the same 63 month period?

4. Aug 8, 2012

### pamparana

Hello,

Thanks for the replies. The 32 responses are from employees that have been employed over that 5 year period.

What I am trying to correlate is employee attitudes towards company performance over that time.

Thanks,
Luca

5. Aug 8, 2012

### chiro

You should be aware that correlation in its most common form between two variables, only makes sense when the relationship is linear (since this is is what it is trying to determine).

You need to take a look at your data and decide whether you can discern any relationship at all, and if necessary transform your data to try and get a linear-looking relationship.

The most important thing however is try put context into your data: if you can quantify the characteristics of the behaviour with a simple function that makes sense non-mathematically (i.e. you can explain what it means in plain english without using mathematics and with a reference to something specific) then this is what you should be doing.

If you are just trying to produce metrics without having a clue what's going on, you'll be setting yourself up to make a potentially bad decision.

6. Aug 8, 2012

### Number Nine

So you have performance data on a per month basis for 63 months, and survey data every month as well? If you only have survey data at one time point, I'm not sure what you would correlate, exactly.