And for my part, I don't see that either the bimodal aspect (yes or no on planet 9) or the integrated versus differential character of the probability distribution are key elements of what is wrong in the logic. Those both just seem like binning issues to me, like how coarse or fine is the bidding choice. I'm thinking that the core issue is the relative fairness of the comparison being made.
A better controlled analogy might make these issues clearer. Let's say we give a 10 question true/false test to a large group of people that contains detailed questions about our own personal life, things that people who don't know us at all have no idea how to answer. But we know that half the people in the group actually do know us quite well, our friends and family, and the other half don't know us from a hole in the wall. Then the first test we grade gets 9/10, and we want to test the question, what are the chances that this test came from the cohort that doesn't know us at all? It would seem by the logic being critiqued, one might say that the probability is 11/210, or about 1%, of getting at least 9 out of 10 by random chance, so that means there's a 99% chance the test came from people who know us. We all agree this is wrong, but it's not wrong because 10/10 is being binned with 9/10 as some kind of integral of "as good or better" than the test we graded, and it's not wrong because we are looking at the people as either knowing us well, or not knowing us at all. It's wrong because we have not included the fact that even people who know us might not know all our personal details, so might do better than 5/10 on average, but might not do as well as 9/10 on average. We would need to calculate how well we expect our friends and family to do on the exam on the average, and figure out what fraction would get 9/10 or better. If that fraction is, say, only 10%, then we should compare the 10% to the 1% and conclude it is ten times more likely the exam came from that cohort-- but not 99% likely, more like 91%.
So that latter calculation seems to be the key missing element in the evidence for planet 9. If there is a 0.2% fraction of solar systems with behavior as aberrant, or more aberrant, than what we see in ours, that doesn't mean it's unlikely due to random chance. It means we need to look at anything else we think could explain the behavior, and assess how often in a random sample would that explanation produce that behavior. That number might be something like 1%, who knows, but the point is it must be calculated to assess the relative likelihoods. If it's only 5 times higher fraction, then we can only say it is 5 times more likely there is a planet 9 than that there isn't (if we are ambivalent about its existence, giving it a 50/50 chance out of ignorance prior to looking at any data). And of course that's a very difficult number to determine, since we'd need to know some kind of probability distribution of possible planet 9s. I think that last part is what we are all saying in different ways, the evidence for planet 9 is not necessarily anywhere close to 99.8% even if there seems to be only a 0.2% chance of getting the aberrant behavior we see from purely random chance.