My concern is that probability can be quite mind bending, and when you say p 'somehow' gives the probability of aggregation... I get worried. I gave an MIT OCW link for 6.041 on the prior page. Let me suggest the 6.041x which is running right now on edx:
https://www.edx.org/course/introduction-probability-science-mitx-6-041x-2 . It started 2 weeks ago but there is a week 0 where not much happens, so you likely are only one week behind. The lectures are lucid, problems challenging and there's great forum support from MIT staff. It is perhaps my all time favorite MOOC.
- - - -
Another interpretation here is that try as I might to use clear notation, I did take a shortcut or two. So to take another stab at this. Consider the erlang distribution shown below.
##f_{Y_k} = \lim_{\delta \to 0}\delta^{-1} * \big(TotalProbabilityAtLeastKArrivals(y + \delta) - TotalProbabilityAtLeastKArrivals(y) \big)##
This exists as true in general. But what if I asked for this in a conditional world where you
know that at least K arrivals occur by time
y + delta ?
Clearly such a probability must be higher than the unconditional case. Indeed, some care is needed, but the basic idea is if you roll a fair die with six sides you have a 1/6 chance of getting a 2. But if I say you are in a conditional world where you know that a number ##\leq 3## was rolled, this means you chance ##= \frac{\frac{1}{6}}{\frac{1}{2}} = \frac{1}{3}##
For avoidance of doubt, you figure out and compute the exact erlang density first (##f_{Y_k}##), and afterward you figure out and compute the conditioning.
So from here, consider how we'd do conditioning for the erlang,
##f_{Y_k | K_{ArrivalsAtY+\delta}} = \frac{f_{Y_k}}{TotalProbabilityAtLeastKArrivals(y + \delta)}##
and then observe in the limit the above is equivalent to
##\lim_{\delta \to 0} f_{Y_k | K_{ArrivalsAtY+\delta}} = \lim_{\delta \to 0} \frac{f_{Y_k}}{TotalProbabilityAtLeastKArrivals(y + \delta)} = \frac{f_{Y_k}}{TotalProbabilityAtLeastKArrivals(y)}= f_{Y_k | K_{ArrivalsAtY}}##
From here just observe that ## (1-p) = TotalProbabilityAtLeastKArrivals(y=T_{max})##
so your very final formula could be written as
$$= [ExpectedTimeUntilFirstArrival=0] + p * T_{max} + (1-p)*\Big(\frac{ \frac{r-1}{\lambda } \Big( F_{Y{_{r} (\tau)}}\Big)}{1-p}\Big)
= [ExpectedTimeUntilFirstArrival=0] + p * T_{max} + \Big(\frac{r-1}{\lambda } \Big( F_{Y{_{r} (\tau)}}\Big)\Big)$$
Thus you can see the (1-p) terms cancel out.
- - - -
A huge part of probability comes down to partitioning arguments. The initial partitioning was between the cases where you time out, with probability p, and the cases you don't, with probability (1-p). From here we just said the total expectation is probability of having less than K (or r-1 if you like) arrivals * T_max +
Something. Note that T_max is the deterministic amount of time that kicks
given that you have less than K arrivals at T_max (i.e. process terminates early). And here
Something = probability of having at least k arrivals by T_max * expected waiting time
given that you have at least than K arrivals at T_max
Partitioning arguments are the underpinning of Law of Total Expectations-- the proof of which is pretty straightforward, at least for the countable case. I'd recommend spending some time with that proof.
If your questions are more fundamental than that, you really need to spend the time on something like 6.041x.
I don't think there is much more that I can say here --- I mean that with respect to the subject matter and original question, as well as time constraints.