Probability observed value not in range for prediction

In summary, the model has a correlation of 92% between GDP and Risk score for countries that have the hazardous substance law. If a country does not have the law, their Risk score is 284. If a country adopts the law, their Risk score is likely to be between 270 and 284.
  • #1
Math33
2
0

Homework Statement


Hello all, I created a predictive model from a data set of observed values and am looking for probabilities for accuracy. Data set A (observed) and data set B (predictive model) have a correlation of 84 % using linear regression. Data set A and B are both normally distributed, also for every predicted B value there is an assigned A value for prediction mapping. Ex: B model produces a score for a data point of 420 and the closest A score to that is 410. Now, let's say in the future this data point is able to be observed (let's call this F.) What is the probability that F is in between 410 and 420.

Homework Equations


P(410<F<420). A and B are two separate normal distributions with two different means and standard deviations.

The Attempt at a Solution


I found the probability of A for 410 in the first normal distribution (let's say P(A)=0.56) then I found the probability of B for 420 in the second normal distribution (let's say P(B)=0.67) I then subsracted P(B)- P(A) to get 0.11. Then I substrated 1-0.11 to get 0.89. So the probability that F is going to be in the range of 410 to 420 is 89%. I am not sure if I'm doing this right . Thanks in advance.
 
Physics news on Phys.org
  • #2
To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?
 
  • #3
andrewkirk said:
To comment usefully on this we'd need more information. A predictive model usually takes the form of an equation with an error term, like
$$X_{B,i}=X_{A,i}+\varepsilon_i$$
where ##X_{A,i}## and ##X_{B,i}## are the ##i##th ##A## and ##B## values respectively and ##\varepsilon_i## is a random variable called the 'error term', usually independent between different values of ##i##. ##\varepsilon_i## has a known distribution - usually, but not always, normal - which is usually the same for all ##i##.

What is the equation for your model?

Hi Andrew, thank you for the response. I actually just got back from work so I don't have the equation in front of me but it is a 5th order polynomial, non-linear model. I fitted the model using excel trendline and actually found a correlation of 92%. Yes, I already did plot the error term (residuals) for each predicted value and a very good random pattern was shown for N=40 data points with virtually no correlation.

What my model is attempting to do is trying to find the relationship between the adaptation of a specific hazardous substance law that is present throughout the globe, and the GDP of any given country. A little background is that I created a metric system that assigns a Risk score based on how much of this hazardous substance law a country adopts. I did this with many existing countries that have this law already and generated a Risk score for each of them. My 92 % correlation is between GDP and Risk score of existing countries that have the law. What I want to try to do is to predict what the hazardous substance law is going to be for a country that might adopt it in the future. So let's say a country F that doesn't have this law has a Risk score of 284 given from the model. Then the closest score country that already has the law has a Risk score of 270. Both predicted and actual data sets are normally distributed. So I am trying to find out once country F adopts the law, what is the probability the final Risk score will be between 270 and 284. Thank you for your time.
 

1. What does it mean when the observed value is not within the predicted range?

When the observed value is not within the predicted range, it means that the actual outcome of an event or experiment does not match what was expected based on the predicted probability. This could be due to chance or a flaw in the prediction model.

2. How is the predicted range determined?

The predicted range is determined by calculating the probability of different outcomes based on the available data and using statistical methods to determine the most likely range of values. This can be done using mathematical equations or through data analysis using tools like regression analysis or machine learning algorithms.

3. Can an observed value falling outside the predicted range be used to improve the prediction model?

Yes, an observed value falling outside the predicted range can provide valuable information for improving the prediction model. It can indicate areas where the model may need to be adjusted or updated to better reflect the actual outcomes. This process is known as model validation and is an important part of the scientific method.

4. Is it common for observed values to fall outside the predicted range?

It depends on the accuracy of the prediction model and the complexity of the event or experiment being studied. In some cases, observed values falling outside the predicted range may be expected, especially if there are many variables at play. However, if the model is well-designed and based on reliable data, observed values should generally fall within the predicted range.

5. How can the reliability of a prediction model be evaluated when observed values fall outside the predicted range?

There are several ways to evaluate the reliability of a prediction model, even when observed values fall outside the predicted range. One approach is to compare the predicted values to a set of known or historical data to see how well the model performs. Additionally, the model can be tested using different subsets of the data to see if it consistently produces accurate predictions. Another method is to use cross-validation techniques, where the model is trained on one set of data and then tested on another set to see how well it generalizes to new data.

Similar threads

  • Calculus and Beyond Homework Help
Replies
4
Views
863
  • Calculus and Beyond Homework Help
Replies
2
Views
751
Replies
5
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
Replies
1
Views
620
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
349
  • Calculus and Beyond Homework Help
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
729
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
2
Views
1K
Back
Top