Trivariate CDF, solving for variable within CDF itself?

  • Context: Graduate 
  • Thread starter Thread starter veejl
  • Start date Start date
  • Tags Tags
    Cdf Variable
Click For Summary

Discussion Overview

The discussion revolves around solving for the variable Force (F) within a cumulative distribution function (CDF) related to injury probability, incorporating factors such as mass (m) and age (a). Participants explore the implications of using a normal distribution CDF and the challenges of estimating force from injury data.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant presents an equation for injury probability involving a CDF and seeks to isolate Force (F) as a function of mass and age.
  • Another participant suggests that the equation can be rearranged to express F in terms of the inverse CDF, but notes that this depends on the specific distribution used.
  • A different viewpoint emphasizes that setting the probability of injury to 1 lacks justification and may lead to overestimating the force required for injury.
  • Some participants discuss the limitations of using the CDF for the purpose of estimating force, indicating that it may not provide a unique solution due to the nature of the variables involved.
  • Concerns are raised about the validity of using Excel for calculations, with one participant suggesting that the proposed method lacks a logical basis.
  • Another participant defends the use of the probit model, asserting its validity in the context of the discussion.

Areas of Agreement / Disagreement

Participants express differing opinions on the feasibility of solving for Force within the given framework. There is no consensus on the approach or the validity of the methods proposed, indicating multiple competing views remain.

Contextual Notes

Participants note that the problem involves estimating joint distributions from incomplete data, which complicates the analysis. There are also discussions about the assumptions underlying the use of the normal distribution and the implications of using cumulative probabilities.

Who May Find This Useful

This discussion may be of interest to those studying statistical modeling, injury biomechanics, or anyone involved in data analysis related to injury probabilities and force estimations.

veejl
Messages
3
Reaction score
0
So I've spent the better part of the last 2 days reading your forums (awesome btw) as well as as scouring Google and other sites for the past week, trying to figure out what to do.

I have this equation here:

p(injury) =\Phi \frac{ln(F) - 2m - 3a + b}{0.8}

probability(injury) = cumulative distribution function (ln(Force) - 2(mass) - 3(age) + constant)/0.8.

I'm trying to figure out a way to solve that eqn for Force, such that I have F = ...

I am hoping to apply this to a data set containing cases of known injuries with mass/age given for each injury. I plan to set the probability(injury) to 1, since I know that an injury did occur. My output would be a force for each case.

I honestly have no clue what to do in solving the equation.
- If I hold mass/age as constants, then my eqn is pretty much useless since it only looks at Force (assuming that solving the CDF requires taking a derivate of that entire eqn).
- Initially I thought to ignore the CDF on a friend's advice that I am looking at a single point versus a cumultive probability. So I got this: F = e0.8*p(fracture) + 2m + 3y - b). But, I think this too is incorrect and some other mathematical permutations need to be happening.
- Tried to convert it into a PDF, but really not sure what that accomplished
- Would "point probability" be a way to go? (http://en.wikipedia.org/wiki/Cumulative_distribution_function#Point_probability"

Some other things that might be useful:
- I know that injuries happen at a minimum Force. So that could be useful as a lower limit or bound.


I am also worried that I can't that I can't justify stating probability(injury)=1, even though it is known that injury did occur. Any thoughts on this point?


any help or insight would be utterly fantastic. thanks in advance!
 
Last edited by a moderator:
Physics news on Phys.org
If you mean p=Phi( (ln(F)-2m-3a+b)/0.8 ) where then the equation is solved by
F = exp( 0.8*Phi^{-1}(p)+2m+3a-b ) where Phi^{-1)(p) is the inverse function of the CDF, also called the quantile function. It depends what Phi is but if it is the normal distribution cdf then there is no simple expression for it but it can be easily calculated in most computing packages, for example with NORMSINV() in Excel.
 
I'll re-state the problem, as I understand it.

The data underlying the problem consists of vectors, each with 4 components. The components are (J,A,M,F) defined by:

J : 1 if there was an injury in the incident and 0 otherwise.
A: age of person in the incident
M: mass involved in the incident
F : force involved in the incident

The probability that J is 1 is a known function Phi(A,M,F).

You only have data of the form: (1,a,m,?) where the 1 indicates there was an injury in the incident, a is the age of the person in incident and m is the mass involved in the incident. The "?" indicates that the force involved in the incident is unknown.

You want "an equation for force" as a function of age and mass. Age and mass don't determine a unique force. If you could find the probability distribution of force as a function of age and mass, you could state a single number like the average force or most probable force.

Setting p(injury) =1 and solving for f in terms of m and a can't be justified by any mathematical reasoning that I see. I think it would tend to overestimate the force involved since, intuitively, that's trying figure out how much force is needed to cause injury with certainty.

I'm not sure this problem is solvable. If it is, I think the solution involves using the data to estimate the joint distribution of (f,m,a). This probably involves making some further assumptions about the data. This is an interesting problem. I'll continue to think about it.
 
Hm.. interesting points.

@bpet - that's correct, I'm looking at a normal distribution CDF

@Stephen Tashi - The original eqn I'm working from is a probability distribution function of injury, taking into account force, age, and mass. I have probability curves from this, plotting force vs probability of injury, at a 3 different ages/masses. Perhaps that helps?
 
veejl said:
Hm.. interesting points.

@bpet - that's correct, I'm looking at a normal distribution CDF

Keep in mind that you are not using the formula for the CDF of a normal distribution as a CDF. You are using it for an entirely different purpose ( to set the parameter of a bernoulli random variable). So the the theory of the normal distribution has nothing to do with your question.

@Stephen Tashi - The original eqn I'm working from is a probability distribution function of injury, taking into account force, age, and mass. I have probability curves from this, plotting force vs probability of injury, at a 3 different ages/masses. Perhaps that helps?

Such plots have no information that is not already in the formula that you gave, unless they indicate the limits placed on the variables. For example, can age be greater than 100?

I don't whether a solving this problem is something that will merely be written up in a school term paper or whether you need an answer for some important practical purpose. This is not a simple problem. As best I can tell, it involves estimating the joint distribution of three variables from data that does not directly give the values of all three variables. You won't find this solved in an introductory statistics book.
 
Ok, I apprec the advice.

I've decided to tweak my original project idea a bit and work with the formula as is. So I will input injury force, and keep ramping that up until my probability of injury is >.5.

Would that be possible? And if so, any tips on tackling it?

Right now, in Excel, I have the following:
=NORM.DIST((ln(F)-2m-3a+b)/.8, 0, 1, TRUE), where mass(m) and age(a) are pulling information from a specific column, and force(F) is pulling from a specific cell that I am inputting a force into.

However, I think this gives the cumulative probability from 0-->F, which is not what I want. I need to find out at a given force, what is the probability of injury.

Any tips with this route?
 
veejl said:
Ok, I apprec the advice.

I've decided to tweak my original project idea a bit and work with the formula as is. So I will input injury force, and keep ramping that up until my probability of injury is >.5.

Would that be possible? And if so, any tips on tackling it?

Perhaps the purpose of spreadsheets is for people to fool around with them without actually knowing what they are doing. In that sense, anything is possible.

The method you propose has no logical or mathematical basis. If there is some serious purpose behind your work, I suggest you hire a qualified consultant. (Apparently, you aren't going to think enough about probability theory to understand the problem yourself.) If the purpose of your work is not that serious then its fine to enjoy doing various random calculations with Excel.
 
veejl said:
...Right now, in Excel, I have the following:
=NORM.DIST((ln(F)-2m-3a+b)/.8, 0, 1, TRUE), where mass(m) and age(a) are pulling information from a specific column, and force(F) is pulling from a specific cell that I am inputting a force into.

However, I think this gives the cumulative probability from 0-->F, which is not what I want. I need to find out at a given force, what is the probability of injury.

The excel formula does give the probability of injury as a function of force. The purpose of the cdf is to convert a number in the range (-inf,inf) to (0,1).


Stephen Tashi said:
...The method you propose has no logical or mathematical basis...

It's a probit model, perfectly valid!
 
veejl said:
So I've spent the better part of the last 2 days reading your forums (awesome btw) as well as as scouring Google and other sites for the past week, trying to figure out what to do.

I have this equation here:

p(injury) =\Phi \frac{ln(F) - 2m - 3a + b}{0.8}

probability(injury) = cumulative distribution function (ln(Force) - 2(mass) - 3(age) + constant)/0.8.

I'm trying to figure out a way to solve that eqn for Force, such that I have F = ...

I am hoping to apply this to a data set containing cases of known injuries with mass/age given for each injury. I plan to set the probability(injury) to 1, since I know that an injury did occur. My output would be a force for each case.

I honestly have no clue what to do in solving the equation.
- If I hold mass/age as constants, then my eqn is pretty much useless since it only looks at Force (assuming that solving the CDF requires taking a derivate of that entire eqn).
- Initially I thought to ignore the CDF on a friend's advice that I am looking at a single point versus a cumultive probability. So I got this: F = e0.8*p(fracture) + 2m + 3y - b). But, I think this too is incorrect and some other mathematical permutations need to be happening.
- Tried to convert it into a PDF, but really not sure what that accomplished
- Would "point probability" be a way to go? (http://en.wikipedia.org/wiki/Cumulative_distribution_function#Point_probability"

Some other things that might be useful:
- I know that injuries happen at a minimum Force. So that could be useful as a lower limit or bound.


I am also worried that I can't that I can't justify stating probability(injury)=1, even though it is known that injury did occur. Any thoughts on this point?


any help or insight would be utterly fantastic. thanks in advance!

So you want to find some kind of functional equation for F (i.e. force)?

From what you have posted the force is based on at least 3 variables (a,m,b) and another "injury" variable.

What kind of relationship do you want to find? Do you want an expression that is generic for all general a,b,m and injury, or do you want constraints on your function (for example fixing m to be a constant)?

Also with regard to an injury actually occurring, it doesn't mean if something occurs that its probability is 1: you have to remember probability is reflected by a long term experiment where the probability converges to the number of times something happens over the total trials. The only way you would get a probability of 1 is if every single time you tried something it happened.
 
Last edited by a moderator:
  • #10
It's a probit model, perfectly valid!

I didn't say the model wasn't valid (although when you see probit models, they are usually based on such crude curve fits that it's hard to take them seriously). I said what he is doing with the model (to obtain "an equation for force") has no logical or mathematical basis.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
2K
Replies
2
Views
2K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 16 ·
Replies
16
Views
3K
  • · Replies 7 ·
Replies
7
Views
3K
Replies
4
Views
10K
  • · Replies 14 ·
Replies
14
Views
10K
  • · Replies 9 ·
Replies
9
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
5
Views
8K