# Hoeffding inequality for the difference of two sample means?

In W. Hoeffding's 1963 paper* he gives the well known inequality:

$P(\bar{x}-\mathrm{E}[x_i] \geq t) \leq \exp(-2t^2n) \ \ \ \ \ \ (1)$,

where $\bar{x} = \frac{1}{n}\sum_{i=1}^nx_i$, $x_i\in[0,1]$. $x_i$'s are independent.

Following this theorem he gives a corollary for the difference of two sample means as:

$P(\bar{x}-\bar{y}-(\mathrm{E}[x_i] - \mathrm{E}[y_k]) \geq t) \leq \exp(\frac{-2t^2}{m^{-1}+n^{-1}}) \ \ \ \ \ \ (2)$,

where $\bar{x} = \frac{1}{n}\sum_{i=1}^nx_i$, $\bar{y} = \frac{1}{m}\sum_{k=1}^my_k$, $x_i,y_k\in[0,1]$. $x_i$'s and $y_k$'s are independent.

My question is: How does (2) follow from (1)?

-Jan

*http://www.csee.umbc.edu/~lomonaco/f08/643/hwk643/Hoeffding.pdf (equations (2.6) and (2.7))

chiro
Hey JanO and welcome to the forums.

One idea I have is to let Z = X + Y and use Z instead of X in the definition.

However, I still do not understand how the term $(m^{-1} + n^{-1})$ comes into the bound. Isn't $z=\bar{x}-\bar{y}$ is still bounded between [0,1]?

-Jan

chiro

However, I still do not understand how the term $(m^{-1} + n^{-1})$ comes into the bound. Isn't $z=\bar{x}-\bar{y}$ is still bounded between [0,1]?

-Jan

Think about what happens to the variances.

It seems like bounded here means all most surely bounded. At least that's how Hoeffding inequality seems to be given elsewhere. I guess it then means that $z=\bar{x}-\bar{y}$ is bounded a.s. between $[\mu_x-\mu_y-\frac{1}{2}\sqrt{m^{-1}+n^{-1}}, \ \mu_x-\mu_y+\frac{1}{2}\sqrt{m^{-1}+n^{-1}}]$.?