Probability observed value not in range for prediction

In summary, the model has a correlation of 92% between GDP and Risk score for countries that have the hazardous substance law. If a country does not have the law, their Risk score is 284. If a country adopts the law, their Risk score is likely to be between 270 and 284.
  • #1
Math33
2
0

Homework Statement


Hello all, I created a predictive model from a data set of observed values and am looking for probabilities for accuracy. Data set A (observed) and data set B (predictive model) have a correlation of 84 % using linear regression. Data set A and B are both normally distributed, also for every predicted B value there is an assigned A value for prediction mapping. Ex: B model produces a score for a data point of 420 and the closest A score to that is 410. Now, let's say in the future this data point is able to be observed (let's call this F.) What is the probability that F is in between 410 and 420.

Homework Equations


P(410<F<420). A and B are two separate normal distributions with two different means and standard deviations.

The Attempt at a Solution


I found the probability of A for 410 in the first normal distribution (let's say P(A)=0.56) then I found the probability of B for 420 in the second normal distribution (let's say P(B)=0.67) I then subsracted P(B)- P(A) to get 0.11. Then I substrated 1-0.11 to get 0.89. So the probability that F is going to be in the range of 410 to 420 is 89%. I am not sure if I'm doing this right . Thanks in advance.
 
Physics news on Phys.org
  • #2
To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?
 
  • #3
andrewkirk said:
To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?

Hi Andrew, thank you for the response. I actually just got back from work so I don't have the equation in front of me but it is a 5th order polynomial, non-linear model. I fitted the model using excel trendline and actually found a correlation of 92%. Yes, I already did plot the error term (residuals) for each predicted value and a very good random pattern was shown for N=40 data points with virtually no correlation.

What my model is attempting to do is trying to find the relationship between the adaptation of a specific hazardous substance law that is present throughout the globe, and the GDP of any given country. A little background is that I created a metric system that assigns a Risk score based on how much of this hazardous substance law a country adopts. I did this with many existing countries that have this law already and generated a Risk score for each of them. My 92 % correlation is between GDP and Risk score of existing countries that have the law. What I want to try to do is to predict what the hazardous substance law is going to be for a country that might adopt it in the future. So let's say a country F that doesn't have this law has a Risk score of 284 given from the model. Then the closest score country that already has the law has a Risk score of 270. Both predicted and actual data sets are normally distributed. So I am trying to find out once country F adopts the law, what is the probability the final Risk score will be between 270 and 284. Thank you for your time.
 

Similar threads

Back
Top