Probability observed value not in range for prediction

Math33 · Apr 21, 2017

Homework Statement

Hello all, I created a predictive model from a data set of observed values and am looking for probabilities for accuracy. Data set A (observed) and data set B (predictive model) have a correlation of 84 % using linear regression. Data set A and B are both normally distributed, also for every predicted B value there is an assigned A value for prediction mapping. Ex: B model produces a score for a data point of 420 and the closest A score to that is 410. Now, let's say in the future this data point is able to be observed (let's call this F.) What is the probability that F is in between 410 and 420.

Homework Equations

P(410<F<420). A and B are two separate normal distributions with two different means and standard deviations.

The Attempt at a Solution

I found the probability of A for 410 in the first normal distribution (let's say P(A)=0.56) then I found the probability of B for 420 in the second normal distribution (let's say P(B)=0.67) I then subsracted P(B)- P(A) to get 0.11. Then I substrated 1-0.11 to get 0.89. So the probability that F is going to be in the range of 410 to 420 is 89%. I am not sure if I'm doing this right . Thanks in advance.

andrewkirk · Apr 21, 2017

To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?

Math33 · Apr 21, 2017

andrewkirk said:

To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?

Hi Andrew, thank you for the response. I actually just got back from work so I don't have the equation in front of me but it is a 5th order polynomial, non-linear model. I fitted the model using excel trendline and actually found a correlation of 92%. Yes, I already did plot the error term (residuals) for each predicted value and a very good random pattern was shown for N=40 data points with virtually no correlation.

What my model is attempting to do is trying to find the relationship between the adaptation of a specific hazardous substance law that is present throughout the globe, and the GDP of any given country. A little background is that I created a metric system that assigns a Risk score based on how much of this hazardous substance law a country adopts. I did this with many existing countries that have this law already and generated a Risk score for each of them. My 92 % correlation is between GDP and Risk score of existing countries that have the law. What I want to try to do is to predict what the hazardous substance law is going to be for a country that might adopt it in the future. So let's say a country F that doesn't have this law has a Risk score of 284 given from the model. Then the closest score country that already has the law has a Risk score of 270. Both predicted and actual data sets are normally distributed. So I am trying to find out once country F adopts the law, what is the probability the final Risk score will be between 270 and 284. Thank you for your time.

Probability observed value not in range for prediction

Homework Statement

Homework Equations

The Attempt at a Solution

1. What does it mean when the observed value is not within the predicted range?

2. How is the predicted range determined?

3. Can an observed value falling outside the predicted range be used to improve the prediction model?

4. Is it common for observed values to fall outside the predicted range?

5. How can the reliability of a prediction model be evaluated when observed values fall outside the predicted range?

Similar threads

Hot Threads

Recent Insights