# Anyone good with statistics?

## Main Question or Discussion Point

Anyone good with statistics??

I've conducted an experiment looking at phonetic symbolism in Japanese words.
Thirty participants listened to 40 Japanese words and had to choose out of a choice of two possible English translations which one they thought was the correct answer.

I'm trying to find out whether or not the participants could give the correct answer at an above-chance expectancy level.

Since they didn't know Japanese, we can assume that they had a 50% chance of giving the correct answer (since there were two possible answers for each word- a correct and incorrect one).

There were 705 correct answers out of 1200 possible correct answers (30 x 40).
This is more than half, but is it significantly more than half?

I wasn't sure of the correct statistical test to use, but I used a Chi-square test and followed the instructions on this page: visualstatistics.net/SPSS%20workbook/chi-square_goodness_of_fit.htm

Obviously they use a different example but I'm pretty sure it's the same thing: comparing observed frequencies with expected frequencies.

The results suggested that it was massively significant. However, even if they only got 614 correct and 586 incorrect, it still reports a significant difference (Chi square = .653, Asymp. Sig. = .419).

This doesn't seem right to me, it seems very low.
If I tossed a coin 1200 times and it landed heads 614 times, then that means the coin is weighted??

705/1200 is only 58.75%, and that's significant?

I'm surely doing something wrong..

I'm sorry it looks like I've posted this in the wrong forum.

I am sure you will get a good response here as well. The way I look at it is that assuming 50/50 chance "on average" your test would yield 600 correct responses. You got 705, which is 17.5% better than average (that is ((705-600)/600))

If you had unlimited time I would suggest you try flipping a coin 1200 times and see if you could get 58.75% heads or tails...

I ran a simple simulation of rand head tails flips 1200 times. After 81 experiments I finally had one that had 701 tails and 499 heads.

I am sure someone will respond with an exact number (with 1200 flips), but with 120 head tails flips, the probability of getting above 70 of either heads or tails is only 5.477%

With 1200 flips the probability of getting above 700 either heads or tails is less.

Unfortunately, even if you figure out the correct way to analyze this to determine whether or not it's statistically significant (it probably is), there is still a problem. What conclusions can you really draw from this?

So you've determined that the people were able to do better than 50%, which is what you'd expect if they really had no indication as to which english word was correct. However, where did this indication come from? Was it that they did in fact know some Japanese? Was it that there is a psychological trend for languages to use similar sounds for similar words? Or was it something about the way you picked the words that the people were quizzed on? (For example, did you take into account the number of cognates and pseudo-cognates in your sample? Did both of the choices for the definition always seem viable?)

---

As an example, let's say that I give you this word and the following choices:
kasa
1: home
2: umbrella

If I were to quiz a completely random group of Spanish speaking people, I'd expect that the majority of people would answer "home". However, if I'd quiz a group of people who happen to know that there are very few similarities between romance languages and Japanese (and know at least a little Spanish), I'd expect that they'd be more likely to answer "umbrella" just because they'd think that the answer would probably not be "home".

The correct answer is of course umbrella.

The problem is that a "random" group of people is not necessarily unbiased. What you have determined is that your group of people is probably slightly biased, but you still don't know where this bias comes from.

Last edited:
Have you only done the experiment once? If so it isn't really enough to reach a firm conclusion, I would do it a few more times. Do any of the japanese words sound remotely like their english translation?

uart
I wasn't sure of the correct statistical test to use, but I used a Chi-square test and followed the instructions on this page: visualstatistics.net/SPSS%20workbook/chi-square_goodness_of_fit.htm

Obviously they use a different example but I'm pretty sure it's the same thing: comparing observed frequencies with expected frequencies.

The results suggested that it was massively significant. However, even if they only got 614 correct and 586 incorrect, it still reports a significant difference (Chi square = .653, Asymp. Sig. = .419).

This doesn't seem right to me, it seems very low.
If I tossed a coin 1200 times and it landed heads 614 times, then that means the coin is weighted??

705/1200 is only 58.75%, and that's significant?

I'm surely doing something wrong..
The test is binomial (two outcomes). For large n the binomial distribution is very well approximated with a Normal distribution mean np and varience np(1-p).

Taking the null hypothesis that the results are guesses (50/50) then p=0.5 and so the mean is 600 and the varience 300 (stdev = 17.3).

So yes the result is massively significant with the actual number of correct responses about 6 standard deviations above what is expected by chance (Z=6).