Hoeffding inequality for the difference of two sample means?

In summary: JanIn summary, W. Hoeffding's 1963 paper presents a well-known inequality (equation (1)) for independent variables in the range of [0,1]. He then provides a corollary (equation (2)) for the difference of two sample means, where the variables are still in the range of [0,1] and are independent. This corollary includes a term (m^{-1}+n^{-1}) in the bound, which may seem confusing. However, this term is taken into account to ensure that the variances of the variables are bounded. This means that the difference of the two sample means, z=\bar{x}-\bar{y}, is almost surely bounded between
  • #1
JanO
3
0
In W. Hoeffding's 1963 paper* he gives the well known inequality:

[itex]P(\bar{x}-\mathrm{E}[x_i] \geq t) \leq \exp(-2t^2n) \ \ \ \ \ \ (1)[/itex],

where [itex]\bar{x} = \frac{1}{n}\sum_{i=1}^nx_i[/itex], [itex]x_i\in[0,1][/itex]. [itex]x_i[/itex]'s are independent.

Following this theorem he gives a corollary for the difference of two sample means as:

[itex]P(\bar{x}-\bar{y}-(\mathrm{E}[x_i] - \mathrm{E}[y_k]) \geq t) \leq \exp(\frac{-2t^2}{m^{-1}+n^{-1}}) \ \ \ \ \ \ (2)[/itex],

where [itex]\bar{x} = \frac{1}{n}\sum_{i=1}^nx_i[/itex], [itex]\bar{y} = \frac{1}{m}\sum_{k=1}^my_k[/itex], [itex]x_i,y_k\in[0,1][/itex]. [itex]x_i[/itex]'s and [itex]y_k[/itex]'s are independent.


My question is: How does (2) follow from (1)?

-Jan

*http://www.csee.umbc.edu/~lomonaco/f08/643/hwk643/Hoeffding.pdf (equations (2.6) and (2.7))
 
Physics news on Phys.org
  • #2
Hey JanO and welcome to the forums.

One idea I have is to let Z = X + Y and use Z instead of X in the definition.
 
  • #3
Thanks Chiro for your responce.

However, I still do not understand how the term [itex](m^{-1} + n^{-1})[/itex] comes into the bound. Isn't [itex]z=\bar{x}-\bar{y}[/itex] is still bounded between [0,1]?

-Jan
 
  • #4
JanO said:
Thanks Chiro for your responce.

However, I still do not understand how the term [itex](m^{-1} + n^{-1})[/itex] comes into the bound. Isn't [itex]z=\bar{x}-\bar{y}[/itex] is still bounded between [0,1]?

-Jan

Think about what happens to the variances.
 
  • #5
It seems like bounded here means all most surely bounded. At least that's how Hoeffding inequality seems to be given elsewhere. I guess it then means that [itex]z=\bar{x}-\bar{y}[/itex] is bounded a.s. between [itex][\mu_x-\mu_y-\frac{1}{2}\sqrt{m^{-1}+n^{-1}}, \ \mu_x-\mu_y+\frac{1}{2}\sqrt{m^{-1}+n^{-1}}][/itex].?

Thanks again for your help!
 

1. What is the Hoeffding inequality for the difference of two sample means?

The Hoeffding inequality for the difference of two sample means is a statistical theorem that provides an upper bound on the probability that the difference between two sample means is significantly different from the true difference between the underlying populations. It is often used in hypothesis testing and confidence interval estimation.

2. How is the Hoeffding inequality for the difference of two sample means calculated?

The Hoeffding inequality for the difference of two sample means is calculated by taking the difference between the two sample means and multiplying it by a factor that depends on the sample size, the variance of the samples, and a confidence level chosen by the researcher. This factor is then added to the difference between the sample means to get the upper bound of the probability.

3. What assumptions are needed for the Hoeffding inequality for the difference of two sample means to hold?

The Hoeffding inequality for the difference of two sample means holds under the assumption that the samples are independent and identically distributed (IID) with a finite variance. This means that each sample is randomly selected from the population and the samples are not influenced by each other.

4. How is the Hoeffding inequality for the difference of two sample means used in hypothesis testing?

In hypothesis testing, the Hoeffding inequality for the difference of two sample means is used to determine if the difference between two sample means is statistically significant. The calculated upper bound of the probability is compared to a predetermined significance level. If the upper bound is smaller than the significance level, the difference between the sample means is considered statistically significant and the null hypothesis is rejected.

5. Can the Hoeffding inequality for the difference of two sample means be applied to non-parametric data?

No, the Hoeffding inequality for the difference of two sample means is based on the assumption that the samples are normally distributed. Therefore, it is only applicable to parametric data. For non-parametric data, other statistical methods such as the Mann-Whitney U test or the Wilcoxon signed-rank test should be used.

Similar threads

Replies
0
Views
333
  • Set Theory, Logic, Probability, Statistics
Replies
0
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
736
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
900
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
715
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
Back
Top