# Trace distance between two probability distributions - prove

1. Dec 8, 2015

### Emil_M

1. The problem statement, all variables and given/known data
Let $\{p_x\}$ and $\{q_x\}$ be two probability distributions over the same index set $\{x\}={1,2,...,N}$. Then, the trace distance between them is given by $D(p_x,q_x):=\frac{1}{2} \sum_x |p_x-q_x|$.

Prove that $D(p_x,q,_x)=max_S |p(S)-q(S)|=max_S | \sum_{x \in S} p_x - \sum_{x \in S} q_x|$, where the maximization is over all subsets $S$ of the index set $\{x\}$.

2. Relevant equations
See above

3. The attempt at a solution
\begin{align*} \frac{1}{2} \sum_x |p_x-q_x| &\geq | \sum_{x \in S} p_x - \sum_{x \in S} q_x|\\ &= |\sum_{x \in S} p_x-q_x| \end{align*}

Then, I have tried playing around with the triangle inequality, but that didn't go anywhere...

Thanks for you help!

Last edited: Dec 8, 2015
2. Dec 8, 2015

### RUber

The key piece of information here is that all the probabilities will sum to 1.
If p(1) - q(1) = z, then the sum of q(2) to q(N) minus the sum of p(2) to p(N) will also be z.

3. Dec 8, 2015

### Ray Vickson

If you let $p_x - q_x = r_x$, the $r_x$ sum to zero. We can re-write $D(p_x,q_x)$ as $(1/2) \sum_x |r_x|$ and $p(S) - q(S) = \sum_{x \in S} r_x$.

We can write
$$\sum_{x \in S} r_x = \underbrace{\sum_{x \in S, r_x > 0} r_x }_ {\geq 0} + \underbrace{\sum_{x \in S, r_x < 0} r_x }_ {\leq 0}$$
Think about what properties $S$ must have in order that the absolute value of the above sum be maximal, say at the subset $S = S_0$.

Last edited: Dec 8, 2015
4. Dec 9, 2015

### Emil_M

Thanks for your help! I get it now