I know that Bayes' Theorem should be applied here, but I don't understand why. I am going to pretend I don't know the formula and try to solve this, but I can't seem to get it:

W = Women in age group 40
B = Has Breast cancer
P = Positive tested for breast cancer

.01W = B
(.8B)W = P
(.096[W - B])W = P

Since 1% of women in this age group have breast cancer,
80% of those with breast cancer in this age group have positive mammographies,
and 9.6% of those without breast cancer in the age group also have positive mammographies.

I solved for W, B, and P and got 9.333 (repeating), .09333 (repeating), and .0696888 (repeating) respectively.

So the probability of having breast cancer from a positive mammography would be: .09333 / .0696888, no? But obviously that isn't the correct answer since that's over 100% chance.

How does one work this out intuitively? I don't even understand how the Bayesian formula is derived for this type of problem. I understand that I have to use it and why it works, but not why I have to use it and why I have to apply it.

Bayesian statistics is basically a fancy way of using conditional probabilities and associated results to do things like statistical inference. Bayesian probability is based on the same sort of thing but its in the context of probability and not statistics.

With the Bayesian probability, what we do is consider parameters in a distribution to be a random variable. In classical statistics parameters are fixed quantities (constants). In Bayesian statistics, these parameters actually have a distribution and this needs to be taken into account and the way we do this kind of thing is to use Bayesian methods.

With your probability example for health, what you are doing is considering a situation where results are conditional on something. In this instance you are considering the cases of false negatives and false positives which in english mean the chance of getting a negative result given that you actually have a disease and getting a positive result given that you do not actually have the disease. We represent this kind of thing as a conditional probability like P(R+ | D-) which would mean probability that we get a positive result given that we don't have the disease.

In terms of Bayes rule, what we are doing is we are looking at three things: calculating a posterior from a likelihood and a prior. The prior is a random variable that corresponds to some parameter and the likelihood is a normal likelihood function like in classical. So for example consider a binomial model: The prior is an independent distribution for the probability parameter, the likelihood is the standard binomial likelihood and the posterior is a distribution for the parameter given the data that you have.

When you see the above in symbols its a way of converting P(A|B) to P(B|A). But in english what this is doing is allowing us for parameters to be random variables.

We do this for a number of reasons. One reason is that we can generalize distributions to take account for the many possible distributions that can exist. Also if we know a particular prior or can estimate it, then we will get more accurate results. This leads to things like being able to use less data points to make an accurate statistical inference.

In terms of theoretical statistics, you can derive all the classical results in classical statistics using the Bayesian methods and you end up getting the same results as the classical for many of the results. Basically classical statistics are a special case of Bayesian statistics.

Also Bayesian statistics allows us to simulate very very complex distributions that can not be simulated through other means. This is used everywhere where probabilities and inferences need to be made, but can't using classical means. It's not just for probabilities though: it's for the whole statistical side of doing hypothesis testing and inference.

There are also philosophical differences but I won't get into those.

If the above is confusing, then the best thing to take away from this is that a) Classical statistics is a special case of Bayesian statistics b) Bayesian statistics allows us to find probabilities and distributions where we can't do with classical means and c) It allows us to understand probability (and statistics) in a different way when we look at things from the conditional perspective.