Is this a Poisson distribution problem?

Click For Summary

Discussion Overview

The discussion revolves around the statistical analysis of mechanical failures in hip prostheses from two manufacturers, A and B. Participants explore how to assess the quality and failure rates of these prostheses over a 20-year period, considering various statistical approaches, including Poisson distribution and differential equations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant questions the validity of using a Poisson distribution for failure events, suggesting that the number of failures may depend on the number of prostheses present over time.
  • Another participant proposes generating simulated data to better understand the failure rates and their implications.
  • A participant suggests that the failure rates calculated (0.5% for A and 0.33% for B) could be misleading without considering the time and number of active prostheses.
  • There is a discussion about using a system of differential equations to model the dynamics of prosthesis failures, accounting for various rates such as implant rates and patient drop-out rates.
  • One participant recommends building a Monte Carlo simulation to analyze the problem, emphasizing the complexity of theoretical analysis in this context.
  • Another participant mentions the potential usefulness of the R package deSolve for simulating differential equations related to the problem.

Areas of Agreement / Disagreement

Participants express differing views on the appropriateness of statistical methods for analyzing the failure data, with no consensus reached on the best approach to take. The discussion remains unresolved regarding the most effective way to assess the quality of the prostheses.

Contextual Notes

Participants acknowledge limitations in the available data, including the lack of detailed records on the timing and number of active prostheses, which complicates the analysis. The discussion also highlights the challenges of modeling failure rates in a non-linear system.

lavoisier
Messages
177
Reaction score
24
Hello,
I have been thinking about this problem for a while, but I can't decide how it should be tackled statistically. I wonder if you can help, please.
Suppose that prostheses for hip replacement are sold mainly by 2 manufacturers, A and B.
Since they started being sold 20 years ago, 100 000 prostheses from A, and 300 000 from B, were implanted in patients.
During these 20 years, mechanical failures that required removal of the prosthesis were recorded for both types, in particular 500 failures for A and 1000 for B. We can assume that failure events were independent from one another, and did not depend on the time after implant: there was just some defect that only became apparent in some prostheses after an essentially random time post-implant.

I don't know what kind of statistics could or should be calculated in such situation, e.g. to make a judgment on the quality of the prostheses, on the propensity of each to break down, on how effective it would be to monitor patients with one prosthesis or the other for possible failures, etc.

I could calculate a Poisson-related λ (average number of events per interval).
It would be 500/20 = 25 failures / year for A, and 1000/20 = 50 failures / year for B.
Then I could calculate the probability of a given number of failures each year (or over different periods) for each type of prosthesis.
However, I have quite a few doubts on this approach.
Isn't the number of events dependent on how many prostheses of each type are present in the population at each time? A bit like radioactive decay, but with variable mass of substance?
For instance, suppose that B was implanted mostly during the first 5 years of the 20-year period we're considering (say 50 000 / year for the first 5 years, and the remaining 50 000 at a rate of ~3300 / year for the next 15 years). Then I would expect that the number of failures was not the same each year, but varied all the time, even day by day as new implants were made and some of them failed and got replaced by a new type.
So isn't my 20-year-averaged Poisson λ ineffective in telling me how many failures I can expect in the future, if I don't consider the dependency of the number of failures on the number of prostheses?
Is there any other theory that would better account for this?

Then, concerning the quality of the prostheses: purely looking at the number of failures seems to say that A is better than B, because historically there have been fewer failures for A than for B.
However, if we divide that by the number of prostheses, we reach the opposite conclusion, because 500/100000 = 0.5% > 1000/300000 = 0.33%.
What I have a hard time figuring out is what these numbers mean - if they mean anything at all.
If I want to know the quality of a mass-produced object, I take a sample of N of them, do some measurements or tests, collect the data, and I can do all sorts of nice statistics, e.g. if n pieces out of N are defective, I can estimate what proportion of them would be defective in the whole population, with its standard error, and thus compare different manufacturers of the same object.
Here instead I don't have any control on the sample I'm observing: I only know the total number of prostheses 'active' at each time, and I observe that at random times some of them fail, with each failure going to add to the count.
But indeed, these events are random. I am not taking 100 patients and measuring directly the quality of their prosthesis, to make a nice table with N and n, split by manufacturer.
So what is the meaning of 0.5% and 0.33% above? Is it an estimate of the proportion of defective prostheses of each type? But how would that make sense, considering that if I had taken the same count at a later time I would have most likely found a larger number of failures for both brands?
How can we combine the element of number of objects with the element of time and with the randomness of the observation of the failure event, into metrics of the quality of these objects and equations that allow us to predict the likelihood of future events?

If you can suggest how I should proceed, I would appreciate it.
Thanks!
L
 
Physics news on Phys.org
I'm not clear on what data you have and don't have, can you provide an example of the raw data you have for some of these events?
 
I can't speak much to the math, but here's how I would present the data if I had the number of failures for each company by month. The timechart uses the attached file 'failures.txt' containing randomly generated failures by month (500 for Company A and 1000 for Company B), and was created by the machine data analytics tool called Splunk.

Note 1: The failures for Company A are weighted x3 compared to Company B.
Note 2: The obvious correlation between the two companies has to do with generating the example data from the same RAND() column.

timechart.jpg
 

Attachments

Thank you @stoomart , this is very interesting.
I don't actually have data, this is a theoretical problem (for now). Your approach of generating simulated data can be useful to study this more pragmatically.
My question is how to make a judgment on the quality of prosthesis A compared to B (e.g. an estimate of what percentage of prostheses of each brand is defective), based on the recorded failures and on the number of prostheses that are 'active' in patients at each time, and the propensity of failure (e.g. the rate of failure per period per implant).
In fact, as the number of failures is quite constant on average in your data, I would expect this to be a situation where their number is also more or less constant. In my example I was thinking more of a case where the number of 'active' prostheses would increase over time, not necessarily linearly, and then I would expect the number of failures to increase with time.

Later I thought a bit more about this, and I think it may be described by a system of differential equations (or at least recurrence equations) accounting for the variation in the number of prostheses of each brand, and within each brand, defective or not defective, as a function of: rate of implant of new ones, rate of drop-out of patients (deaths etc), rate of failure (only for defective ones). This would be a non-linear system of 4 equations, which I probably shouldn't try to solve analytically. If I had simulated data, however, I could try fitting the equations.
Not as easy as I thought...
 
lavoisier said:
Not as easy as I thought...
Not easy but it seems possible, I think the key is having data that allows you to correctly calculate the company weights for each interval, something like 'number_sold' and 'sales_began' should work. I'm not sure tracking the number of 'active' prostheses will be too helpful, it seems like it would be terribly difficult if not impossible to maintain accurate data. Definitely an interesting problem to consider.
 
My advice is to build a Monte Carlo simulation of it. Even if you think you have a theoretical solution, you should check it against a simulation. These things quickly become too complicated for theoretical analysis. Your problem, with the changing number of remaining devices and a mixed population will probably make the analysis difficult.
 
Thank you @FactChecker ; I was indeed not keen on attempting a symbolic solution. I don't even think it's possible in this case.
Funnily enough, another problem came up at work that will probably require a numerical solution of a system of differential equations in R.
Which I have never tried before (I have done that with other software, which essentially took appropriately small dt intervals and solved the equations iteratively as if they were recurrence relations); so it will be interesting...
As for Monte Carlo, my boss is a fan of that method; I will have to find out how that works in practice.
 
Found an excellent tool to do this in R: deSolve. Not only it can simulate the time course for many types of differential and even difference equations, given the parameters, but it can also fit experimental data (i.e. find the best estimates for the parameters). I'm going to try it asap.
There's also sysBio, but it's in development at the moment. I can write the equations, no problem.
 
  • Like
Likes   Reactions: FactChecker and stoomart

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 12 ·
Replies
12
Views
4K
  • · Replies 3 ·
Replies
3
Views
1K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
8
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K