# Problem comparing samples

Hi,

seems to me I cannot get a grip on the maths behind the statistics and my head is dizzy from all the terms and definitions that I do not know head or tail any more. So I would like to have someone check my approach and point me the right way if I am wrong. Thanks in advance for this.

Problem:
In clinical studies to determine the efficacy of a certein treatment I have two groups, one receiving a placebo, the other the remedy. The difference of the two groups is usually attributed to the remedy, but I have my doubts on this. Let us assume the incidence in the population is 100,000 infected people out of which I took two samples of 100 persons each for placebo and verum groups. The criterion that I use to determine efficacy is the number of patients that recover within say two weeks. Lets assume that I find 30 % of the patients in placebo group meet this criterion. What is the percentage in verum-group that would indicate efficacy of the treatment ? What if verum-group came out with say 45 % ?

Approach 1: Pearson's Chi-squared test

My null hypothesis is that nothing did change, that is, the treatment did not have any influence on the result. As I do not know the proportion of instant healings in the population my estimate is that this equals the proportion in my placebo group. So I do the calculation

X2 = (0.45 - 0.30)2/ 0.30 + (0.55 - 0.70)2/0.70 = 0.107

For a single degree of freedom this gives p = 0.90 (estimated, from graph, not exact). So I would assume that I do not have enough significance to discard my null-hypothesis.

What astonishes me, if this was a proper approach, that the size of the groups does not affect significance this way. If I had done the study with only 10 people in each group or as much as 1000, the resulting significance would be the same.

Approach 2: Consider size of samples

I found the formula on wikipedia that a proportion can be estimated by the results of a sample to lie with 95 % confidence interval given by

^p ± √ 0.25 / n (btw: How to properly write formulas here ?)

So placebo group indicates that with 95 % confidence the proportion of instant healings in the population would be between 0.35 and 0.25. So 0.45 is is not within this interval and therefore most probably indicates some efficacy of the treatment. If I would have obtained this result with groups of 10 only, this interval would be 0,14 to 0.46 which would lead to a different result.
Of course here the size of groups is too small as seen by the big interval.
But what astonishes me here is, that the number of the population does not have any influence. I would guess that the size of my groups compared to my population should influence the result. If my group is 10 % of my population of 1000 it should yield more reliable results than if it was only 0.1 % of a population of 100,000.

Approach 3: estimate of sample size

Edit after I found how to write formulas with latex:

on the Internet I found this formula for the size of samples (without any indication of its source however)

n ≥ $\frac{z²\;θ\;(1\;-\;θ)\;N}{Δθ² \;(N \;- \;1)\; +\; z² \;θ\; (1\; -\; θ)}$

with
n - size of sample
N - number of population
z - Normal distribution value for level of significance (p = 0.05 -> z = 1.96)
θ - fraction of population
Δθ - width of confidence interval

I could rearrange this formula to determine Δθ for my size of samples and received Δθ for the placebo group to be 9 % and for the verum group roughly 10 %
So with a probability of 95 % the confidence zones for placebo are 25.5 % to 34.5 % and verum 40 % to 50 %. They do not overlap, so I assume there has been some effect of the treatment.

So, anybody there that can point me a way out of this jungle ?

N.A

Last edited:

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Stephen Tashi
seems to me I cannot get a grip on the maths behind the statistics and my head is dizzy from all the terms and definitions that I do not know head or tail any more.
First, we should get the basic ideas straight. (My apologies if you already know them.)

Applying statistics is to a real world problem is a subjective matter. There is no guarantee that two approaches will give the same conclusion.

The type of statistics you are doing does not tell you the probability that a given idea ("hypothesis") is true or the probability that it is false. It only gives you numbers that tell you the probability of the data assuming that some hypothesis is true. In hypothesis testing there is a procedure for "rejecting" the null hypothesis but this is merely a procedure, not a proof that the null hypothesis is false and it does not tell you the probability that the null hypothesis is false.

When you estimate some parameter and obtain a (say) 95% "confidence interval" with numerical endpoints (such as 0.25 plus or minus .32), this does not imply that there is a 95% probability that the true value of the parameter is within that interval. In your post, you appear to be assuming that it does.

When a common sense person approaches a real world problem he asks questions like "What is the probability that my idea is true given the data" or "What is the probability that the true value of the parameter is in this particular interval given the data?". These questions cannot be answered unless you make more assumptions about the situation that the usual ("frequentist") type of statistics makes. You need something like Bayesian statistics to answer those questions. The impressive jargon of frequentist statistics ("significance", "confidence") misleads many people since it sounds like they are being given answers to the common questions. However they are not. (For example, look up the difference between a "confidence interval" and a Bayesian "credible interval".)

Approach 1: Pearson's Chi-squared test

My null hypothesis is that nothing did change, that is, the treatment did not have any influence on the result. As I do not know the proportion of instant healings in the population my estimate is that this equals the proportion in my placebo group. So I do the calculation

X2 = (0.45 - 0.30)2/ 0.30 + (0.55 - 0.70)2/0.70 = 0.107
My guess is that you used the current Wikiipedia article on Pearson's Chi-squared test. That article incorrectly says that the test statistic uses "frequencies". If you look at other sources on the web, you see that the test statistic uses "numbers" of things.

In your other approaches, you appear to be thinking incorrectly about confidence intervals, as I pointed out above. In some situations there is a relation between "acceptance intervals" in statistical hypothesis testing and "confidence intervals" in estimation. However, "hypothesis testing" and "estimation" are different subjects. You're trying to do hypothesis testing using formulas from the subject of estimation.

Last edited:
it sounds like they are being given answers to the common questions. However they are not.
That's a pity, but appeals to my feeling of being lost and looking in vain for what I need On the Chi squared test I looked up the German wikipedia and there it is with numbers. I will be using this one.

Thanks.

N.A

haruspex
Homework Helper
Gold Member
2020 Award
Approach 2: Consider size of samples
...
But what astonishes me here is, that the number of the population does not have any influence. I would guess that the size of my groups compared to my population should influence the result.
Yes, that does seem surprising, but it is quite correct. The total population can be far larger than the sample - it doesn't matter. What does matter is the absolute size of the sample. You can see that by letting the total population size tend to infinity (for a fixed sample size) in the equation in your third method. The relative sizes only become important when the sample is a large fraction of the total population.

Btw, efficacy is more than just a question of being better than chance. Classical hypothesis testing is designed only to indicate whether the treatment has an affect, not how strong it is. A full Bayesian analysis requires you to plug in an a priori spectrum of probabilities for efficacy of treatment. You could compromise, something like this:
- pick a level of efficacy (enhanced probability of recovery) which makes the treatment interesting; e.g. suppose it's p without treatment and q with treatment, and your criterion is q > p+x.
- compute the probability of the outcome assuming q is in (p-x, p+x) (flat distribution);
- compute the same assuming q is in (p+x, p+3x);

Thanks haruspex.

After reading about Bayesian statistics I decided on the following course of action:

(1) find a buddy in the statistics field
(2) stand him a couple of beers and talk him into reviewing my data promising more beer
(3) solve problem
(4) have party on the promised beer

No, no kidding, this stuff is way outside of my expertise and I will have to contact somebody to get some substantial support.

Thanks all.

N.A