I recently went through the exercise of using Bayesian probability to figure out the most likely probability for "heads" given that H tosses yielded heads out of N trials. The derivation was enormously complicated, but the answer was very simple: p = \frac{H+1}{N+2}. In the limit as N \rightarrow \infty and H \rightarrow \infty, this approaches the relative frequency, \frac{H}{N}, but it actually is better-behaved. Before you ever toss the first coin, with N = H = 0, the Bayesian estimate gives p = \frac{1}{2}. If you get heads for the first toss, this estimate gives p = \frac{2}{3}, rather than the relative frequency estimate, p = 1.
I should probably explain what I mean by "the most likely probability". I start off assuming that each coin has a parameter--I'm going to call it B, for bias--that characterizes the coin tosses. The model is that:
P(H | B) = B
So the bias is just the probability of heads. But I'm treating it as a parameter of the model. As a parameter, it has a range of possible values, 0 \leq B \leq 1. If i have no idea what the value of B is, I can use the least informative prior, which is to assume that B is uniformly distributed in the range [0,1].
That's kind of an odd concept--we're talking about the probability of a probability. Kind of weird, but let's go on.
So we toss the coin N times and get H heads. Then Bayesian updating tells us the adjusted, posterior probability distribution for B, given that data. The rule is (letting E(H,N) be the fact that I got H heads when I flipped the coin N times):
P(B | E(H,N)) = \frac{P(E(H,N)| B) P(B)}{P(E(H,N)}
where P(E(H,N) | B) is the probability of E(H,N), given B, and P(B) is the prior probability density of B (which is just 1 for the least informative prior), and P(E(H,N)) is the prior probability of E(H,N), not knowing anything about B.
These can be computed readily enough:
P(B) = 1
P(E(H,N) | B) = B^H (1-B)^{N-H} \frac{N!}{H! (N-H)!}
P(E(H,N)) = \int dB P(B) P(E(H,N)|B) = \frac{N!}{H! (N-H)!} \int dB\ B^H (1-B)^{N-H}
That last integral is hard to do, but it's done here:
https://math.stackexchange.com/questions/86542/prove-binomnk-1-n1-int-01xk1-xn-kdx-for-0-leq-k-le
\int dB\ B^H (1-B)^{N-H} = \frac{H! (N-H)!}{(N+1)!}
That gives: P(E(H,N)) = \frac{1}{N+1}
So our posterior probability distribution for B is:
P(B|E(H,N)) = \frac{(N+1)!}{H! (N-H)!} B^H (1-B)^{N-H}
Now, we compute \langle B \rangle_{E(H,N)}, which is the expected value of B, given E(H,N). The formula for expectation values is:
\langle B \rangle_{E(H,N)} = \int dB\ B\ P(B | E(H,N)) = \frac{(N+1)!}{H! (N-H)!} \int dB\ B^{H+1} (1-B)^{N-H}
We can write: \int dB\ B^{H+1} (1-B)^{N-H} = \int dB\ B^{H+1} (1-B)^{(N+1)-(H+1)} = \frac{(H+1)! (N-H)!}{(N+2)!}. So we can immediately write:
\langle B \rangle_{E(H,N)} = \frac{(N+1)!}{H! (N-H)!} \frac{(H+1)! (N+1-H)!}{(N+2)!} = \frac{H+1}{N+2}
Like I said, very simple result that is very complicated to derive.