# Odds of dying of the same disease on the same date as another...

• I
Recently, John McCain died of glioblastoma (brain cancer.) The odds of dying on particular date is 1 in 365. He died of the same disease as did Ted Kennedy, on the very same date that Kennedy did. Glioblastoma affects 3.1 in 100,000 population. The mean average life span living with the disease, getting treatment for it, is 14 to 18 months. Potential causes of glioblastoma are certain carcinogenic chemicals and large doses of radiation. Can one infer from the statistical odds that the deaths were either purely coincidental or if they were the result of foul play? I see it as a similar problem as proving if some particular signal in the CERN data is random or meaningful. In this case, if the odds are ridiculously high, that it would prove that there were other factors involved and not mere coincidence. Additionally, would one include the fact that both were powerful senators in calculating the odds or would that be irrelevant?

Last edited:

Dale
Mentor
2021 Award
Glioblastoma affects 3.1 in 100,000 population.
OK, you have to be very careful with these numbers. First, it is an aggressive cancer, so the proportion of deaths due to glioblastoma is greater than the incidence of glioblastoma. Also, the risk of death by glioblastoma substantially increases by age, and perhaps by other demographic factors. So that is not going to be the right number to use.

The other thing is that this is a multiple comparisons problem. How did you select those two people for this comparison? Is that a random sample? If not then how do you account for your biased sample?

I found it odd that two powerful Senators which there are very few of alive, both died on the same day of the same cause. Though years apart. I was simply trying to find out if there was a way to calculate the appropriate odds of it being coincidence and whether or not one could determine if in fact there were other factors provable. I suggested foul play. However, maybe it would be something like a particular shot that people in government get, like, say, an experimental vaccine that Congressmen get for anthrax. I am not trying to start a conspiracy here. I am simply trying to figure out what factors one would need to take into account (such as the ones you mentioned) and from the results, to be able to definitively say with a high degree of certainty whether or not the deaths were purely coincidence, or whether it can be determined that there was a common cause between the deaths. The 3.1 figure is the number of deaths per year per 100,000 of glioblastoma in the population in general. And the mean survival rate is given for those receiving treatment for their cancer (I know McCain was being treated and I am pretty sure Kennedy was probably being treated as well.) Here is a link to more statistics on glioblastoma. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123811/

Dale
Mentor
2021 Award
The 3.1 figure is the number of deaths per year per 100,000 of glioblastoma in the population in general
No 3.2 is the “incidence”, meaning the number of new cases diagnosed each year per 100000 living population. What you want is the proportion of deaths attributed to glioblastoma, not the rate of new glioblastoma diagnoses.

StoneTemplePython
Gold Member
I found it odd that two powerful Senators which there are very few of alive, both died on the same day of the same cause. Though years apart. I was simply trying to find out if there was a way to calculate the appropriate odds of it being coincidence and whether or not one could determine if in fact there were other factors provable. I suggested foul play. However, maybe it would be something like a particular shot that people in government get, like, say, an experimental vaccine that Congressmen get for anthrax. I am not trying to start a conspiracy here. I am simply trying to figure out what factors one would need to take into account (such as the ones you mentioned) and from the results, to be able to definitively say with a high degree of certainty whether or not the deaths were purely coincidence, or whether it can be determined that there was a common cause between the deaths. The 3.1 figure is the number of deaths per year per 100,000 of glioblastoma in the population in general. And the mean survival rate is given for those receiving treatment for their cancer (I know McCain was being treated and I am pretty sure Kennedy was probably being treated as well.) Here is a link to more statistics on glioblastoma. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123811/

You may want to listen to the "More or Less" podcast... they address stuff like this in a very witty way.

The problem is you have a data snooping issue of sorts. Somewhere some rare coincidence is happening. It could have been in the UK's parliament (or Bundestag or wherever) not in the US. And then you or your counterpart in that country would notice such a coincidence there think something is up.

Rare stuff happens all the time if your scour the globe and the years. The fact that they died of the same disease may be of interest.

The fact that they died on the same day is completely irrelevant from any vantage I can see.

What most people do is they scour the world ignoring all the banal common stuff and then they find some rare coincidences and make up stories / speculations about them. This is a waste of time.

PeroK
Homework Helper
Gold Member
2021 Award
The odds are 100% that some 'coincidence" like this will happen sometime. It's inevitable.

No 3.2 is the “incidence”, meaning the number of new cases diagnosed each year per 100000 living population. What you want is the proportion of deaths attributed to glioblastoma, not the rate of new glioblastoma diagnoses.

Thank you for that. I did miss that fact. I did see another figure but can't find a source for it right now. It was 2.4 per 100,000 and I believe that was the death rate from glioblastoma, but don't quote me on that.

You may want to listen to the "More or Less" podcast... they address stuff like this in a very witty way.

The problem is you have a data snooping issue of sorts. Somewhere some rare coincidence is happening. It could have been in the UK's parliament (or Bundestag or wherever) not in the US. And then you or your counterpart in that country would notice such a coincidence there think something is up.

Rare stuff happens all the time if your scour the globe and the years. The fact that they died of the same disease may be of interest.

The fact that they died on the same day is completely irrelevant from any vantage I can see.

What most people do is they scour the world ignoring all the banal common stuff and then they find some rare coincidences and make up stories / speculations about them. This is a waste of time.

I'll have to look up that Podcast. Sounds interesting.

I don't know what most people do. Yes, the date is meaningless unless it is shown that there is a connection to people or what people do. Maybe the x-ray machine at the House is putting out way too much x-rays. The x-ray machine that the Congressmen, Senators and staff go through, not the ones that the general public passes through, for example. I don't know that the date is of interest, but then I don't know that it's not either.

I don't find it a waste of time because I am attempting to understand what the determining factors are for something to be considered normal and not worth wasting time over, and what is worth wasting the time to explore. Since you know that and I obviously don't, perhaps you could share your insight. I have done some rough estimations, and find that, using 2015 death data for glioblastoma, that the death rate for the population over 60 should be around 1 in 4000. Since there have been a total of a little over 10,000 congressmen in our entire history, that would mean 3 would have died of the disease since 1789. The list given in Wikipedia (provided below) shows that there are 3 today that have it (and possibly 4 more as the type of tumor is not given.) Roughly that's a 1200-2800% higher incidence than I would expect for the 1000 or so living congressmen (those in congress and those retired as well.) Would the CDC find the 'bump' in the data interesting enough to pursue? Or would they just consider it a waste of time?

That's what I am trying to determine here. How do I get a handle on the significance (or non-significance) of this data. Where does it lie statistically? Ho-hum? Interesting? Near impossible, there must be something else going on that we don't know about yet?

https://en.wikipedia.org/wiki/List_of_people_with_brain_tumors

The odds are 100% that some 'coincidence" like this will happen sometime. It's inevitable.

If all the atoms in a cup of coffee all move the same direction at the same time, it would appear that the coffee would jump out of a cup all on its own. Since motion of the atoms is random, it's possible that they all jump the same direction at once, but the probability is simply just barely non-zero so you are never going to see it. And if you did I would suspect additional factors involved that one was not yet aware of rather than to believe I just watched the virtually impossible just happen.

Dale
PeroK
Homework Helper
Gold Member
2021 Award
Thank you for that. I did miss that fact. I did see another figure but can't find a source for it right now. It was 2.4 per 100,000 and I believe that was the death rate from glioblastoma, but don't quote me on that.

I'll have to look up that Podcast. Sounds interesting.

I don't know what most people do. Yes, the date is meaningless unless it is shown that there is a connection to people or what people do. Maybe the x-ray machine at the House is putting out way too much x-rays. The x-ray machine that the Congressmen, Senators and staff go through, not the ones that the general public passes through, for example. I don't know that the date is of interest, but then I don't know that it's not either.

I don't find it a waste of time because I am attempting to understand what the determining factors are for something to be considered normal and not worth wasting time over, and what is worth wasting the time to explore. Since you know that and I obviously don't, perhaps you could share your insight. I have done some rough estimations, and find that, using 2015 death data for glioblastoma, that the death rate for the population over 60 should be around 1 in 4000. Since there have been a total of a little over 10,000 congressmen in our entire history, that would mean 3 would have died of the disease since 1789. The list given in Wikipedia (provided below) shows that there are 3 today that have it (and possibly 4 more as the type of tumor is not given.) Roughly that's a 1200-2800% higher incidence than I would expect for the 1000 or so living congressmen (those in congress and those retired as well.) Would the CDC find the 'bump' in the data interesting enough to pursue? Or would they just consider it a waste of time?

That's what I am trying to determine here. How do I get a handle on the significance (or non-significance) of this data. Where does it lie statistically? Ho-hum? Interesting? Near impossible, there must be something else going on that we don't know about yet?

https://en.wikipedia.org/wiki/List_of_people_with_brain_tumors

If all the atoms in a cup of coffee all move the same direction at the same time, it would appear that the coffee would jump out of a cup all on its own. Since motion of the atoms is random, it's possible that they all jump the same direction at once, but the probability is simply just barely non-zero so you are never going to see it. And if you did I would suspect additional factors involved that one was not yet aware of rather than to believe I just watched the virtually impossible just happen.

Ted Kennedy was a Democrat and John McCain a Republican. If you restrict your criteria to one party or the other, the coincidence disappears.

I think that, numbers notwithstanding, there may be another problem in terms of logical linkage here.

Why would such a coincidence, regardless of probability, suggest foul play any more than it would suggest, say, a dislike of asparagus? Another way to look at this would be, if one was to posit foul play axiomatically, why would there be a likelihood of such synchronized dates? Just seems that the coincidence, striking or not as it happens, does not directly or otherwise lead to the drawn conclusion even in terms of a window of probability.

diogenesNY

PeroK
Dale
Mentor
2021 Award
I have done some rough estimations, and find that, using 2015 death data for glioblastoma, that the death rate for the population over 60 should be around 1 in 4000.
OK, so that is a more reasonable sounding number. So, we would expect p=1/4000 that John McCain died of glioblastoma and p=1/4000 that Ted Kennedy died of glioblastoma and p=1/365 that they died on the same date (given that both were over 60 when they died). So naively we would expect that the overall probability is about 1.7E-10. So, if you placed a bet years ago (when the younger turned 60, for example) that John McCain and Ted Kennedy would die of glioblastoma on the same date you could have rightly asked for those extremely high odds.

But now we are not talking about a prediction, we are talking about a retrospective look at data. This data is surprising, but how surprised should we be? Well, we aren't particularly interested in specifically John and Ted, but any two powerful and public older men would be equally surprising. So say that is about 1000 men, then there are about 500,000 combinations of 2 men that we could compare. Now, glioblastoma is not the only rare cause of death, and we would have been equally surprised by any of them retrospectively. So say that there are 1000 different causes of death that we would consider surprisingly rare, then for each of the 500,000 pairs of men we would have 1000 different causes that would surprise us, so there are a total of about 500,000,000 multiple comparisons that we need to take into account. By my calculations this takes the odds from 1.7E-10 to about 0.09.

Maybe you disagree about the 1000 men and the 1000 causes of death and think that it is more like 500 men and 500 causes of death that would be surprising. That would bring the odds to about 0.01, so conceivably you could count that as "statistically significant", but this is still a more ordinary level of surprise. In any case, you have to consider the multiple comparisons issue in your analysis of surprise.

I used foul play as one example of a cause that may be linked to a date incidence. Given that the life expectancy mean is 14 to 18 months, that is a window of about 4 months, or 120 days, reducing the odds of dying on the same date from 1 in 365 to 1 in 120 if the cause of the disease was linked to a particular annual event, like a New Years Day party or something. (Ted Kennedy died 13 months after he contracted it, or was it after detection? And McCain, 15 months.) Enter the nefarious spy who always attends those New Years day parties.

But it could also be due to something like an experimental vaccination they get that the general populace does not, shortly after taking office. Or as I said above, maybe an x-ray machine they pass through that is malfunctioning, putting out too high a dose of radiation, that only Congressmen go through (maybe in a back door entrance or an entrance the public in general does not normally go through.)

I am not saying that that there is a nefarious cause. I am not even saying that it is not coincidental. I don't really believe there is a conspiracy there, even if it was not coincidence, as there are a lot of other factors that could have been in common between the two men, as both of them were in the Senate. Who knows, maybe a carcinogenic chemical used to clean all the Senate desks and chairs to make sure nobody has tried to kill them with Anthrax.

I am simply giving a few examples of causes that might give meaning to a coincide of the dates if it turns out NOT to be coincidence. Again I am not saying that there IS some connection other than coincidence.

I am trying to determine how does one go about revealing if there is or is not a coincidence. Can that be revealed by statistical analysis? Or is there too little information? I think the problem here is that the data set is too small. Like a medical test on a few people not being very meaningful. Just on the face of it alone, using the 1 in 4000 figure as the probability of the death of a person over 60 years of age, and the coincidence of date, it is a 1 in 1,460,000 odds, is it not? Not so huge though the odds of being struck by lightning in a given year is considered to be 1 in 700,000, and in your lifetime, 1 in 3000.

OK, so that is a more reasonable sounding number. So, we would expect p=1/4000 that John McCain died of glioblastoma and p=1/4000 that Ted Kennedy died of glioblastoma and p=1/365 that they died on the same date (given that both were over 60 when they died). So naively we would expect that the overall probability is about 1.7E-10. So, if you placed a bet years ago (when the younger turned 60, for example) that John McCain and Ted Kennedy would die of glioblastoma on the same date you could have rightly asked for those extremely high odds.

But now we are not talking about a prediction, we are talking about a retrospective look at data. This data is surprising, but how surprised should we be? Well, we aren't particularly interested in specifically John and Ted, but any two powerful and public older men would be equally surprising. So say that is about 1000 men, then there are about 500,000 combinations of 2 men that we could compare. Now, glioblastoma is not the only rare cause of death, and we would have been equally surprised by any of them retrospectively. So say that there are 1000 different causes of death that we would consider surprisingly rare, then for each of the 500,000 pairs of men we would have 1000 different causes that would surprise us, so there are a total of about 500,000,000 multiple comparisons that we need to take into account. By my calculations this takes the odds from 1.7E-10 to about 0.09.

Maybe you disagree about the 1000 men and the 1000 causes of death and think that it is more like 500 men and 500 causes of death that would be surprising. That would bring the odds to about 0.01, so conceivably you could count that as "statistically significant", but this is still a more ordinary level of surprise. In any case, you have to consider the multiple comparisons issue in your analysis of surprise.

Thank you for that answer! I don't see that I need to disagree with your model. I respect it for what it is. I see your point.

But if I were the CDC wouldn't I just be looking at one particular disease at a time? Like someone initiating an investigation over some observed apparent connection, say cell phones that cause brain cancer. This case would be whether or not there is an environmental factor at the House that could explain what seems to be a higher incidence of glioblastoma among members of the House than there is in the general population when taken as a whole. There are 3 that are listed in that list as having died of glioblastoma, and 4 others that the actual tumor type is not listed. I suppose the thing I wonder here is what the standard deviation would be. What would be a reasonable deviation I guess. Where 1 in 4000 is the norm for the larger data set, is 3 in 1000 (1 in 333) reasonable deviation when taking a smaller slice of the data set? If it is then that alone could explain it I guess.

I think I see a point here now that I didn't see before. When I was trying to calculate the odds of it happening, the date seemed to be a relevant value in calculating the odds. The date CAN be relevant only if some connection to a human cause and one can not add that factor in until the link is proven. For example, I can double the odds of it by simply saying both are males. Though true, it is not valid unless it turns out that being male is some sort of contributing factor. So though both men dying on a particular date may be a way to prove on what date the spy put the poison in the drink, until a link to a poison is established, you can't use the date in the calculations on the probability, because until such link is proven it is truly only coincidence. Unless of course, everyone that died of the disease all died on the same date, but that would have to be more than 2 people I would think.

So really the only valid statistic is the death rate for the whole of the populace over 60 (as most people that get this disease are older people) vs the statistics of the members of the House getting this disease. Just how far out of the norm it is. Once it is established that there is in fact a significant deviation from what would be expected, could one then look for causal links and add those factors in to help determine perhaps what the causal link was. Does that sound right?

Dale
Mentor
2021 Award
But if I were the CDC wouldn't I just be looking at one particular disease at a time?
Perhaps, but with a specific epidemiological model in mind. I don’t know the details of those models but I strongly suspect that not one of them includes how famous they are or what day of the year they die.

Mostly true. But let's look at this possible situation. People on a bus get on and off the bus a few times, and that evening everyone dies. Bus has nothing on it that would kill them, so one would conclude that where they stopped is where they came into contact with the 'agent' that killed them. If later it was discovered that a particular nerve gas was the cause, and it took 4 hours for that nerve gas to kill, once exposed to it, then one would know to go to the place the bus was 4 hours before they died to look for the source of the exposure. Or look at the general data of poisoning deaths for 1978 and it will probably appear similar to any other year. Choose a specific date however, and you might notice that there are a lot of people that died on that particular day, far outside of the daily norm, caused by the Jonestown massacre. If you knew nothing about that event but did have poisoning data for that year broken down by day, you could find that anomaly and deduce that it was an unusual event with a common cause. In the case of McCain and Kennedy, etc., perhaps the actual data to look at would not be just Congressmen and Senators but all people that regularly spend time in the House and see if the data gets farther skewed from the norm or if it looks less skewed.

Dale
Mentor
2021 Award
The thing is that your examples are ones where there is a model that fits the surprising facts. So the assumption you are apparently trying to infer is that there must therefore be some unknown model that fits these surprising facts.

However, if you search any large data set you may randomly find something surprising for which there is no model because it is simply random. Those occur but you didn’t include in your examples, because they don’t fit the narrative and they are less memorable.

Here, with a rough accounting for multiple comparisons, you have something that is in the range of not significant at all to barely significant. This is not the statistical equivalent of a whole bus full of people dying.

pinball1970
Gold Member
Recently, John McCain died of glioblastoma (brain cancer.) The odds of dying on particular date is 1 in 365. He died of the same disease as did Ted Kennedy, on the very same date that Kennedy did. ?

Its a roll of the dice in terms of genetics with many cancers. I think looking back finding co-incidences is always going to uncover something surprising. Nothing more than co-incidence, if we never had co-incidences THAT would be surprising.

https://en.wikipedia.org/wiki/Lincoln–Kennedy_coincidences_urban_legend

PeroK
PeroK
Homework Helper
Gold Member
2021 Award
There was a news item recently about someone who won the lottery twice. Given that the odds of winning on a given week are one in ten million, the report claimed that the odds of that happening were one in a hundred million million, which is, of course fairly meaningless.

I don't know how many people win the lottery and keep playing every week, but let's say that 100 do so.

Every week, therefore, the odds that one of those previous winners wins is about one in 100,000. And, that one of them wins in a given year about one in 2,000. And, after 20 years, it's about one in a hundred.

Given there are perhaps a hundred lotteries round the world, it's likely to happen sooner or later.

And, if previous winners buy multiple tickets each week, then it becomes inevitable sooner or later.

This is an example of where the initial odds seem staggering, but an analysis of the situation reveals that nothing particularly surprising has happened.

I agree with all of you. That's what I was saying when I said that the coincidence of date, for example, pushes the odds much higher, but would only be relevant IF one found sufficient evidence (in some other way perhaps) that human involvement was a factor. For example, if they had died of anthrax or exposure to some radio isotope that couldn't have been accidentally encountered. Only then does the date become relevant. Without the date coincidence added in, the odds are not so far out of the norm. 1 in 4000 deaths per year is my best estimate for the general population and this situation was a 1 in 12,000 odds. But the sample is so small that it's easy to get results far outside of the norm. And I think that that is the issue. A very small sample. The cases for relevance of date I just gave for clarification of the view that date can be relevant in some circumstances. Anyhow, I think I have my answers, and I thank you all for weighing in on it.