# Sign Test

1. Dec 23, 2017

### tzx9633

1. The problem statement, all variables and given/known data
In the photo above ,
H0 = the position of the plant doesnt affect the number of seeds in the pods .
H1 = position of the plant affect the number of seeds in the pods

3. The attempt at a solution
This is a two -tailed test , am i right ? Referring to the normal distribution table , z α=0.05 = 1.645 , z α/2=0.025 = 1.960 ... In the example , it's clear that the author use one -tailed test . I think it's wrong . He should use z α/2=0.025 = 1.960 , Am i right ? Correct me if i am wrong .
2. Relevant equations

#### Attached Files:

File size:
41.5 KB
Views:
18
File size:
31.2 KB
Views:
17
• ###### 773.PNG
File size:
32 KB
Views:
16
2. Dec 23, 2017

### Staff: Mentor

If the author intended you to use a one-tailed test, it's not very clear from the problem description.

3. Dec 23, 2017

### tzx9633

then , it's one tail or 2 tailed test ? Based on your opinion ... Because the H1 = position of the plant affect the number of seeds in the pods , so i assume it's 2 tailed test

4. Dec 23, 2017

### FactChecker

I think you are right. The one tail test would be appropriate for these hypotheses:
H1 = Being on top increases the number of seeds
H0 = Being on top does not increase the number of seeds

The two tail test is appropriate for the stated hypotheses of the book, regarding any difference -- increase or decrease. The Z value of the book is for the 90% two-tail confidence level, (95% on each side). (see the second table, "Critical values", in http://pegasus.cc.ucf.edu/~pepe/Tables )

Last edited: Dec 23, 2017
5. Dec 23, 2017

### tzx9633

Here's my notes

pairs of observations are independent and

– the sample size is large or small and data normal then use the t-test.

– the sample size is small and the data not normal then use the Wilcoxon rank sum (Mann-Whitney U) test.

• pairs of observations are dependent and

– the sample size is large or small and data normal then use the paired t-test.

– the sample size is small and the data not normal then use the Wilcoxon signed rank test.

In the previous example , it's dependent test ( test whether the
the position on the plant affect the number of seeds in the pods or NOT )
, am i right ? For dependent test , there are only 2 choices , right ? Which are Mann's whiteny and t-test , am i right ???? Why the author use z -test in the first example ?

6. Dec 23, 2017

### Ray Vickson

There are many things wrong with this example.
(1) The data make little sense. How can the number of seeds be 5.2 in the top pod and 3.7 in the bottom pod of the exact same plant? Don't seeds come in integer numbers, 0,1,2,...? How can you have an "average number of seeds" for a single plant? I suppose there could be N1 pods on top and N2 pods on the bottom, with the "averages" being the average number per pod among the N1 top pods, etc. However, in that case, why bother with averages? Just look at the total number of seeds on top and on the bottom. It looks to me like a highly artificial problem using some arbitrary numbers, designed to give the illusion of a real problem. However, YOU are stuck with doing the example, whether it makes sense or not!

(2) How can the seed numbers in the top or bottom, or their difference, be Binomial(10, 0.5)? Here, the '10' looks like the number of plants tested, but the number of seeds in an individual plant will not depend on how many plants we choose to examine. The number of seeds per plant will be determined by the biology of the plant itself, and possibly by environmental factors, etc. Possibly, saying that the number of seeds is random with distribution $\text{Binom}(N,p)$ could be a good approximation, but there is no way to say a priori that $N=10$ and $p = 1/2$. I guess it is possible that the person setting the problem really meant so say that "observation shows that the probability of seed numbers is approximately binomial with parameters $N=10$ and $p = 0.5$", but if that is the case, that is the way they should have said it. Otherwise, they are likely to maximize the confusion of the student.

(3) Since the top/bottom numbers are paired (to a single plant), using a paired-sample test (such as a paired-sample t-test) might be appropriate. Certainly for a single plant the top and bottom numbers are dependent, but with careful experimental design or appropriate data-gathering, the numbers between different plants might, possibly, be independent. Even if a paired-sample t-test is not appropriate (because of non-normality of the data), it would make sense to use a non-parametric test for $H_0: \mu=0$ vs $H_1: \mu \neq 0$ for the mean $\mu$ of a sample $X_1, X_1, \cdots, X_{10}$. Here $X = \text{top number} - \text{bottom number}$. And, a two-sided test would be the way to go.

I looked only at your first posted image; as I said before, I won't look at posted images of solutions, only at typed work.

Last edited: Dec 23, 2017
7. Dec 23, 2017

### FactChecker

It would require a separate statistical test to determine if the paired results are dependent or not. If an unpaired test is used where a paired test is possible, a lot of information is lost and the unpaired test may be much weaker. So I agree that a paired test would be better. There is no reason to assume that the paired results are independent of each other. The book answer is a paired test. Each plant result is a comparison of its pair of top versus bottom and turned into a single binomial result (top greater => +, top smaller => -). The total of the binomial results is then approximated by the normal distribution. This is the usual thing to do for a large binomial sample.

8. Dec 23, 2017

### tzx9633

ok , thanks for your explaination , why t-test isnt used here ? Why z-test is used ???

9. Dec 23, 2017

### tzx9633

It's stated in post #5 that pairs of observations are dependent and

– the sample size is large or small and data normal then use the paired t-test.

Last edited by a moderator: Dec 23, 2017
10. Dec 23, 2017

### FactChecker

Rigorous application of the t-test would be difficult for this problem. The paired samples t-test needs paired samples from fixed populations. In this example, each number is an average of seeds from pods. The number of pods of each plant and each top/bottom may be different and so each average may be from a different distribution. By turning each plant into one binomial sample point (top larger or top smaller), those issues disappear. Once the problem is turned into a binomial, the approximation for large n (preferably n > 20, but this is just an example problem) is the normal distribution.

11. Dec 23, 2017

### tzx9633

So , the notes is wrong , when the test are dependent , we should use normal distribution (Z test) ? And not t-test ?

12. Dec 24, 2017

### FactChecker

In this case, I think so. There is more to using the t-test than just "dependent". (see https://en.wikipedia.org/wiki/Student's_t-test#Assumptions). The population distributions should also be the same normal distribution for each of the option of the pair. In other words, the population of the top pod numbers should be from one normal distribution and the population of the bottom pod numbers should be from another normal distribution. The two distributions (top and bottom) should have the same variances.

That being said, the Student's t-test is reasonably robust regarding violations of the required assumptions. So it may be acceptable to use. I have not studied violations of the assumptions. What the book did by approximating the binomial with a small sample of 10 with a normal distribution is also marginal. The recommendation is to have a sample size of at least 20 (see https://en.wikipedia.org/wiki/Binomial_distribution#Normal_approximation ).

13. Dec 30, 2017

### tzx9633

Can you help me to confirm again ? I read a several online notes , the alpha used is 0.05 not 0.025(alpha /2) for 2 tailed test ......

14. Dec 30, 2017

### FactChecker

1) The test should be a 2 tailed test because the null hypothesis you specified is "doesn't effect". That includes effects of higher or lower -- so 2 tails.
2) You specified a Zc value of 1.645 in the OP. You can see on the "Critical Values" table in the link that the Zc value of 1.645 is the .90 2 sided value.
3) A 2 tail test at 0.90 confidence has 0.05 on each side (tail). So the number you give is for a 90% confidence level.

If you want a 95% confidence, you need to either change the hypothesis to 1 sided "tops have more seeds than the bottoms", or change Zc value to 1.96.

15. Dec 30, 2017

### tzx9633

so , your conclusion is by saying at alpha = 0.05 , then the Zc should be Z0.025 = 1.96 , am i right ?

16. Dec 30, 2017