# Probability observed value not in range for prediction

Tags:
1. Apr 21, 2017

### Math33

1. The problem statement, all variables and given/known data
Hello all, I created a predictive model from a data set of observed values and am looking for probabilities for accuracy. Data set A (observed) and data set B (predictive model) have a correlation of 84 % using linear regression. Data set A and B are both normally distributed, also for every predicted B value there is an assigned A value for prediction mapping. Ex: B model produces a score for a data point of 420 and the closest A score to that is 410. Now, let's say in the future this data point is able to be observed (let's call this F.) What is the probability that F is in between 410 and 420.

2. Relevant equations
P(410<F<420). A and B are two separate normal distributions with two different means and standard deviations.

3. The attempt at a solution
I found the probability of A for 410 in the first normal distribution (let's say P(A)=0.56) then I found the probability of B for 420 in the second normal distribution (let's say P(B)=0.67) I then subsracted P(B)- P(A) to get 0.11. Then I substrated 1-0.11 to get 0.89. So the probability that F is going to be in the range of 410 to 420 is 89%. I am not sure if I'm doing this right . Thanks in advance.

2. Apr 21, 2017

### andrewkirk

To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where $X_{A,i}$ and $X_{B,i}$ are the $i$th $A$ and $B$ values respectively and $\varepsilon_i$ is a random variable called the 'error term', usually independent between different values of $i$. $\varepsilon_i$ has a known distribution - usually, but not always, normal - which is usually the same for all $i$.

What is the equation for your model?

3. Apr 21, 2017

### Math33

Hi Andrew, thank you for the response. I actually just got back from work so I don't have the equation in front of me but it is a 5th order polynomial, non-linear model. I fitted the model using excel trendline and actually found a correlation of 92%. Yes, I already did plot the error term (residuals) for each predicted value and a very good random pattern was shown for N=40 data points with virtually no correlation.

What my model is attempting to do is trying to find the relationship between the adaptation of a specific hazardous substance law that is present throughout the globe, and the GDP of any given country. A little background is that I created a metric system that assigns a Risk score based on how much of this hazardous substance law a country adopts. I did this with many existing countries that have this law already and generated a Risk score for each of them. My 92 % correlation is between GDP and Risk score of existing countries that have the law. What I want to try to do is to predict what the hazardous substance law is going to be for a country that might adopt it in the future. So let's say a country F that doesn't have this law has a Risk score of 284 given from the model. Then the closest score country that already has the law has a Risk score of 270. Both predicted and actual data sets are normally distributed. So I am trying to find out once country F adopts the law, what is the probability the final Risk score will be between 270 and 284. Thank you for your time.