Mentallic said:
What kind of inference for p would I make? I only have the data of shooting k successful throws out of 50 total, so all I can imagine doing with this is to falsely assume that my true average p is k/50.
Nothing false about it as a probable approximation; you would not be surprised to find the true value of p being close to f= k/50.
Let's cook up some simple numbers: say you observe f = 0.2, so k = 10. How surprised would you be by this result if you knew p = 0.9? Well, how probable is it that you would observe X as small as 10 (i.e., X <= 10) when X~Bin(50,0.9)? How surprised would you be to observe X as large as 10 when X~Bin(50, 0.001); that is, what would be the probability of observing X >= 10 in such a case? Note that we do not necessarily ask the probability that X = 10 exactly, since this will likely always be small for any p; it is just one of 51 separate probability figures that must all sum to 1.
A confidence interval [a,b] is a kind of "not too surprised" interval, so that if p ε [a,b] the probability of observing X = 10 is not "too small". In your case you do not want to be more surprised than 5%, so you would typically (and usually) choose upper and lower limits giving 2.5% probability each. So, you would choose the upper limit 'b' to give P(X <= 10) = 0.025 for X~Bin(50,b) and the lower limit 'a' to give P(X >= 10) = 0.025 for X ~ Bin(50,a). Of course, we could choose upper and lower probabilities different from 0.025, just as long as the two of them sum to 0.05, but choosing the symmetrical case is customary.
Modern software can handle such problems pretty easily, but you might, nevertheless, choose instead to use a normal approximation to the binomial.
Formally: a 95% confidence interval [a,b] is a random interval whose chance of overlapping the true value of p is 95%. (The interval is random because it is constructed from the observed data, which is itself random.) Classical statisticians would say it is incorrect to speak of "the probability that p lies in the interval is 95%"; rather, they would insist that it is the interval itself to which the probability is attached. This concept can be tricky to grasp.
A Bayesian (which I almost am) would find it a lot easier: he/she would start with a "prior" probability distribution f0(p) of the parameter, and would then just look at the posterior distribution f1(p) of p, after observing the data. That is, f1(r) = P{r < p < r + dr|X=k}/dr = P{X=k|p=r} * f0(r) /P{X=k}. For a uniform prior f0 = 1, we get
f_1(r) = \frac{r^k (1-r)^{50-k}}{B(k+1,50-k+1)},
where ##B(u,v)## is the so=called Beta function:
B(u,v) = \int_0^1 x^{u-1}(1-x)^{v-1} \, dx.
Anyway, p would now be regarded as having a probability density f1, and the confidence interval [a,b] would be truly an interval in which p was 95% likely to fall.
The classical and Bayesian confidence intervals would be sightly different, since the classical method looks at {X <= 10} and {X >= 10}, while the Baseyian looks at {X = 10}.