Test of hypothesis for independent samples

In summary, the conversation discusses the basics of t-tests, including the types of t-tests and when to use them. It also touches on the importance of understanding degrees of freedom and alpha levels in relation to hypothesis testing. One issue raised is the common convention of using a 5% alpha level, and the potential drawbacks of this practice. The conversation also brings up the idea of reporting effect sizes instead of just focusing on significance.
  • #1
chwala
Gold Member
2,727
382
Homework Statement
Kindly look at the link below ( the steps are pretty clear to me) i need some clarification though.
Relevant Equations
Stats
Reference;

https://www.statisticshowto.com/probability-and-statistics/t-test/

My question is, can we as well have 'subtract each ##x## score from each ##y## score?' thanks.
...t-tests after all are easy to comprehend; as long as one knows the types;
i.e
1. indepedent sample tests (compares means btwn groups)
2. Paired sample(mean from same group comaprison) &
3. One sample

then you good to go... then a matter of understanding Dof and alpha level as compared to calculated value to ascertain any of the given hypothesis questions.
 

Attachments

  • stats1.png
    stats1.png
    16.2 KB · Views: 93
Last edited:
Physics news on Phys.org
  • #2
Yes, you can. See the note under step 8.
 
  • Like
Likes chwala
  • #3
Why does the author indicate that if you do not have a specified alpha value then use ##5\%##, any specific/particular reason? Why not ##2\%## or ##10\%## in that matter i.e reference step##7##.
 
  • #4
For the paired t-test, from:
https://www.statisticshowto.com/probability-and-statistics/t-test/ ( My bold)

"When to Choose a Paired T Test / Paired Samples T Test / Dependent Samples T Test​


Choose the paired t-test if you have two measurements on the same item, person or thing. But you should also choose this test if you have two items that are being measured with a unique condition. For example, you might be measuring car safety performance in vehicle research and testing and subject the cars to a series of crash tests. Although the manufacturers are different, you might be subjecting them to the same conditions.

With a “regular” two sample t test, you’re comparing the means for two different samples. For example, you might test two different groups of customer service associates on a business-related test or testing students from two universities on their English skills. But if you take a random sample each group separately and they have different conditions, your samples are independent and you should run an independent samples t test (also called between-samples and unpaired-samples).The null hypothesis for the independent samples t-test is μ1 = μ2. So it assumes the means are equal. With the paired t test, the null hypothesis is that the pairwise difference between the two tests is equal (H0: µd = 0). "

An issue, point I think is interesting here is that this technique is used in classification schemes. Tw o objects are in the same class if the variability between them is within a limited range, and in different classes otherwise. As in: How/When do we declare two dogs are of the same breed?
 
  • Like
Likes chwala
  • #5
chwala said:
Why does the author indicate that if you do not have a specified alpha value then use ##5\%##, any specific/particular reason? Why not ##2\%## or ##10\%## in that matter i.e reference step##7##.
It's become something of a standard, through no specific reason I'm aware of. This has given rise to criticism on the basis of the choice/number being arbitrary. There's been some discusion to include effect size in such tests in part for this reason: If, say, the difference in outcome of two medicines is significant at some level, but the effect size is of minor, then you might not care as much. Meaning that one medicine will reduce duration of a cold by 3 days ( if untreated), while the other one will reduce duration by 4 days, then significance itself , is not of much value.
 
Last edited:
  • Informative
Likes chwala
  • #6
chwala said:
Why does the author indicate that if you do not have a specified alpha value then use ##5\%##, any specific/particular reason? Why not ##2\%## or ##10\%## in that matter i.e reference step##7##.
It is a common convention in many fields. It's bad. The original idea sounds good: If you test one hypothesis in your study, then on average only 1 in 20 studies with no effect will falsely call something significant. In practice people rarely have just a single precisely defined hypothesis and don't correct their analyses properly for that. To make things worse many journals don't want to publish null results, giving scientists an even larger incentive to dig up something they can call significant, and also making it impossible to see how many studies are done in total. As a result, we get tons of "significant" results that are just random fluctuations. You can see it in distributions of p-values. The range *just* below 0.05 is more common than we should expect, nicely shown in this plot (z-values, so p=0.05 corresponds to z=1.96)

Significance isn't the interesting property anyway. If your sample size is large enough you'll always find a significant effect for essentially everything - that doesn't mean it's relevant. If option 1 reduces some risk by 2% +-0.5% (p<0.001 for having an effect) and option 2 reduces the risk by 40% +- 21% (p>0.05), which option do you prefer? The second one, of course - despite the smaller significance it's far more likely to help a lot, while the first option is a certain but minimal reduction.
More studies should report effect sizes instead of focusing on arbitrary "significance" points.
 
  • Like
Likes WWGD
  • #7
mfb said:
It is a common convention in many fields. It's bad. The original idea sounds good: If you test one hypothesis in your study, then on average only 1 in 20 studies with no effect will falsely call something significant. In practice people rarely have just a single precisely defined hypothesis and don't correct their analyses properly for that. To make things worse many journals don't want to publish null results, giving scientists an even larger incentive to dig up something they can call significant, and also making it impossible to see how many studies are done in total. As a result, we get tons of "significant" results that are just random fluctuations. You can see it in distributions of p-values. The range *just* below 0.05 is more common than we should expect, nicely shown in this plot (z-values, so p=0.05 corresponds to z=1.96)

Significance isn't the interesting property anyway. If your sample size is large enough you'll always find a significant effect for essentially everything - that doesn't mean it's relevant. If option 1 reduces some risk by 2% +-0.5% (p<0.001 for having an effect) and option 2 reduces the risk by 40% +- 21% (p>0.05), which option do you prefer? The second one, of course - despite the smaller significance it's far more likely to help a lot, while the first option is a certain but minimal reduction.
More studies should report effect sizes instead of focusing on arbitrary "significance" points.
mfb,is this the main issue in terms of the problem of replicability/reproducibility of results in the social sciences? Is it a problem throughout all sciences, rather than just social sciences?
 
  • #8
I think if people would focus more on effect sizes and confidence intervals and be fine with publishing null results we would reduce the problem and increase reproducibility a lot. Reproduction would be results consistent within the uncertainties.

That's another issue of a binary "significant"/"not significant" classification. If one study claims it's significant (odds ratio 1.35, p=0.02, 95% CI from 1.05 to 1.65)) and a similar study says it's not (odds ratio 1.25, p=0.10, 95% CI from 0.95 to 1.55), do they disagree? Of course not - they are within one standard deviation of each other. But they seem to say very different things.

There are fields that do it much better and particle physics is among them. Most studies are repetitions (typically but not always with better precision), most results of searches are null results (which do get published without issue), failing to reproduce previous measurements is very rare even for the measurements which measure a non-zero value.
 
  • Like
Likes chwala and WWGD

FAQ: Test of hypothesis for independent samples

1. What is a test of hypothesis for independent samples?

A test of hypothesis for independent samples is a statistical method used to determine if there is a significant difference between two independent groups. It compares the means of two samples to see if the differences between them are due to chance or if they are statistically significant.

2. How is a test of hypothesis for independent samples conducted?

A test of hypothesis for independent samples is conducted by first defining the null hypothesis, which states that there is no significant difference between the two groups. Then, a sample from each group is collected and their means are compared using a statistical test, such as the t-test or ANOVA. The results of the test are then used to either accept or reject the null hypothesis.

3. What is the purpose of a test of hypothesis for independent samples?

The purpose of a test of hypothesis for independent samples is to determine if there is a significant difference between two independent groups. This can help researchers make conclusions about the relationship between variables or determine if a treatment or intervention has had a significant effect.

4. What is the difference between a one-tailed and two-tailed test of hypothesis for independent samples?

In a one-tailed test, the alternative hypothesis is directional, meaning it predicts that one group will have a higher or lower mean than the other. In a two-tailed test, the alternative hypothesis is non-directional and predicts that the two groups will have different means. The choice between a one-tailed or two-tailed test depends on the research question and the specific hypothesis being tested.

5. What are some limitations of a test of hypothesis for independent samples?

Some limitations of a test of hypothesis for independent samples include the assumption of normality and equal variances in the two groups, which may not always be met. Additionally, the results of a statistical test only provide evidence for or against the null hypothesis, and cannot prove causation. Other factors, such as sample size and measurement error, can also impact the results of the test.

Similar threads

Replies
1
Views
2K
Replies
1
Views
1K
Replies
8
Views
1K
Replies
5
Views
2K
Replies
1
Views
1K
Replies
21
Views
2K
Back
Top