ametisto said:
Thanks a lot for your speedy reply, chiro. Good to know I'm in the right place.
Ok, let me explain what it is I'm trying to do:
To put it simply, I am trying to see if there is a correlation between two grammatical structures belonging to two different grammatical structures. The structures in questions look like this:
a. ser + Verb in the past participle (category of voice - as in the active/passive voice)
a. tener + Verb in the past participle (category of aspect - a category similar to tense)
When certain verbs are used in the a construction, the sentence becomes ungrammatical. My hypothesis is that the a construction in fact belong to the same category as the b construction. It is well known that the b construction is subject to certain lexical restrictions, i. e. not all verbs can be used in this construction. No such restrictions exist for the a category. Therefore, if my hypothesis is correct and a belongs to the same category as b, they should be subject to the same lexical restrictions. I have therefore come up with a list of 231 sentences which are ungrammatical in a. Then I have used the same verbs to form sentences using the b construction to find out if the verbs that make the a sentences ungrammatical, also result in the ungrammaticality of the b sentences. And this seems to be the case: approx. 85% of the a sentences are also ungrammatical in the b sentences.
Now somebody mentioned that I could use the chi-square distribution to analyse my data professionally and give them more credibility. I had never heard of the chi-square distribution before, and the person telling me about it is also a linguist, so I am not at all sure what exactly it is about, or if it's at all suitable for my type of data. I hope I made myself clear. If I forgot anything important, do let me know.
Again, thanks a lot for your help and patience;)
I did a quick google for chi-square analysis and I got the following link:
http://www.colby.edu/biology/BI17x/freq.html
In statistics this is known as a "goodness-of-fit" test.
Just so you know I am training to become a statistician, and as a result I know that there the chi-square distribution (which you are using in the goodness-of-fitness test) is actually used for a variety of purposes and the fitness test is just one application of this distribution.
The GOF (goodness of fitness) test works like this: you find the amount of deviation from an expected distribution against an observed which gives you a total "variance". You then use the chi-squared distribution with the appropriate degree of freedom to check statistically if the observed is "different enough" (think has a high enough variation between the two distributions) to the expected distribution.
So the next logical thing is to find out exactly what the expected and observed data is.
From what you have said, you have two categories a and b that have certain linguistic constraints (with one being a lot more constrained than the other).
Now from what you have posted I'm guessing you have a set of verbs and each verb has a frequency of being "declined" in each category (a and b).
The frequency data forms the "expected" and the "observed" distribution. Basically your lower constrained frequency data (category a) is your expected and the category b frequency is your observed. Basically you are seeing how category b fits to category a where a is the less constrained set and b is the higher constrained set.
First I need to make sure that this is what you want to do: in other words, check if the frequency information between categories a and b is statistically significantly "close enough" to each other.
If this is the case, then I can tell you how to analyze your data, and give you an idea of how to interpret it and what that actually means.