The null hypothesis H_0 is a statement, not a number. So it wouldn't make sense to say \theta = H_0. As I understand their notation P_{ H_0}[S \leq k ] means "The probability that S is less than or equal to k under the assumption that H_0 is true" and P_{p_0}[ S \leq k ] means the "The probability that S is less than or equal to k under the assumption that the probability of success is p_0. So both expressions refer to the same probability. (Your text didn't do a good job of definining those notations.)
In that example it is not necessary to speak of the maximum probability computer over the set of all \theta. The null hypothesis in the example is stated as \theta = p = p_0. So the null hypothesis only deals with a single value of \theta.
If they had chosen to state the null hypothesis as \theta = p \ge p_0 then we would have done the same computation for \alpha as the example did, since p = p_0 is the value of p that maximizes the probability that S \leq k among all the possible values of p that are allowed when the null hypothesis is true.
It's an interesting question whether the null hypothesis in the example should be "The new treatment has the same effectiveness as the old treatment" or whether it makes more sense to make it say "The new treatment is no more effective than the old treatment". Since the example proposes a "one tailed" acceptance region, I think it makes more sense to phrase the null hypothesis the second way.
Trying to prove whether a particular type of acceptance region (one tailed, two tailed, or a even bunch of isolated intervals) is "best" involves defining what "best" means. The only way I know to approach that topic in frequentist statistics is to compare the "power" of tests that use difference acceptance regions. The power of a test is defined by a function, not by a single number, so comparing the power of two tests is not straightforward either.