- #1
pink.noise
- 3
- 0
I have the following problem:
Suppose there is a survey of persons with two properties each (a and b). a can take a relatively small number of values (say the political inclination: Conservative, Liberal, Socialist, or Other) the other one b can take a large number of values (say: the country the person is living in).
Now there are two datasets from annual surveys. The first one contains the numbers of Conservative, Liberal, Socialist and Other respondents between 1996 and 2010 (thus a rather short time series). The second one contains the number of respondents by country from 1996 through 2010. The shares of the political camps by country are unknown.
Is there any possible way to estimate the shares of political inclinations for specific countries?
I thought making the heroic assumptions that
1) the number of respondents in general and for any possible combination of a and b is large enough and that
2) the changes of shares of political inclinations do not change in time and are only different by country (thus the data from 1995 through 2010 could be treated as independent observations),
a simple correlation matrix could be used. (Idea 1)
Further a linear regression, taking the time series from the first dataset (political inclinations) as independent, the numbers of individuals from the specific country as dependent variable might as well be useful. (Idea 2)
Using Idea 1 (correlation matrix), I find a number of high correlations, deemed significant by their t-value. (To use the t-value for significance testing of correlation coefficients is right, isn't it?) They might however still be accidental nonsense correlations caused by the same but unrelated strong trend in both time series. (Meaning: if all the time series are growing over time because the sample of the survey is increased from year to year, everything should be highly correlated without representing any useful information at all, right?)
Using Idea 2 (linear regression) I cannot find anything significant (p-value rarely going lower than 10% and if so then usually for the axis intercept not for the more interesting coefficients.)
Canonical Correlation would probably suffer from the same problems as Idea 1 (correlation matrix), right?
Non-linear regression would not be helpful as well since we are dealing with numbers of persons and shares of a population which should be linear, right?
Dividing any value of the data sets by the total number of respondents in the specific year - would this eliminate the nonsense correlations from the correlation matrix (Idea 1)? I'm not sure, but probably not completely ...
Is there any more elaborate method to get some information out of this dataset?
Any help is appreciated. Thanks.
PS: Since this is a rather specific question, it would possibly have been better placed in a statistics forum such as talkstats.com. However it is not possible for me to register there - either they have disabled registration completely or they don't accept email addresses outside the US - both of which is of course rather stupid if they want to keep the forum going.
Suppose there is a survey of persons with two properties each (a and b). a can take a relatively small number of values (say the political inclination: Conservative, Liberal, Socialist, or Other) the other one b can take a large number of values (say: the country the person is living in).
Now there are two datasets from annual surveys. The first one contains the numbers of Conservative, Liberal, Socialist and Other respondents between 1996 and 2010 (thus a rather short time series). The second one contains the number of respondents by country from 1996 through 2010. The shares of the political camps by country are unknown.
Is there any possible way to estimate the shares of political inclinations for specific countries?
I thought making the heroic assumptions that
1) the number of respondents in general and for any possible combination of a and b is large enough and that
2) the changes of shares of political inclinations do not change in time and are only different by country (thus the data from 1995 through 2010 could be treated as independent observations),
a simple correlation matrix could be used. (Idea 1)
Further a linear regression, taking the time series from the first dataset (political inclinations) as independent, the numbers of individuals from the specific country as dependent variable might as well be useful. (Idea 2)
Using Idea 1 (correlation matrix), I find a number of high correlations, deemed significant by their t-value. (To use the t-value for significance testing of correlation coefficients is right, isn't it?) They might however still be accidental nonsense correlations caused by the same but unrelated strong trend in both time series. (Meaning: if all the time series are growing over time because the sample of the survey is increased from year to year, everything should be highly correlated without representing any useful information at all, right?)
Using Idea 2 (linear regression) I cannot find anything significant (p-value rarely going lower than 10% and if so then usually for the axis intercept not for the more interesting coefficients.)
Canonical Correlation would probably suffer from the same problems as Idea 1 (correlation matrix), right?
Non-linear regression would not be helpful as well since we are dealing with numbers of persons and shares of a population which should be linear, right?
Dividing any value of the data sets by the total number of respondents in the specific year - would this eliminate the nonsense correlations from the correlation matrix (Idea 1)? I'm not sure, but probably not completely ...
Is there any more elaborate method to get some information out of this dataset?
Any help is appreciated. Thanks.
PS: Since this is a rather specific question, it would possibly have been better placed in a statistics forum such as talkstats.com. However it is not possible for me to register there - either they have disabled registration completely or they don't accept email addresses outside the US - both of which is of course rather stupid if they want to keep the forum going.