Combination probability of variables that are not independent

In summary: Thanks for the heads up!In summary, Chiro is struggling to find a model that can predict Home Touch-downs and Away Touch-downs as being independent, but a logistic regression with a generalized linear model provides the best fit.
  • #1
iambasil
14
0
Hello,

I'm hoping I might be able to get some help in creating a forecasting model (in sports) looking at 2 variables that are not independent of each other.

I'll take US Football (same applies to rugby) as an example. The specific forecast I'm interested in here is the expected supremacy between two teams at the end of a match (Team B points minus Team A points).

There's a fair bit out there others have done looking at how to forecast the most likely supremacy outcome (the 'line' which generally isn't just stats, but involves looking at prices set by the betting world).

However, what I'm most interested in is how to create a forecasted probability of supremacies that are different to the line. As an example, if it is assessed that the E(line) is -4.3, I want to work out the probability that the supremacy would actually be any of:
-10, -9, -8, ..., 0, 1, 2, ...9, 10 (etc)

There is obviously error in the line itself, which needs to be taken into account, so I looked at historic data as a guide. As you might expect, combining the expectation of team A's point haul with that of team B's based on the line but independent of each other does not return a good enough fit (not negatively skewed enough and kurtosis too high) - team A's point haul will generally have an affect on team B's (and vice versa) - they are not independent of each other.

Are you able to please share some ideas on how to adjust for the fact that team A and team B are related in calculating supremacy. Really appreciate your help!

Basil
 
Physics news on Phys.org
  • #2
You are saying that team As haul is conditional on team B's ... so you are looking for something involving conditional probabilities and Beyes' theorem.
 
  • #3
Thanks for responding.

Well, it's not entirely conditional, but it is affected by it to varying degrees.

So what I'm struggling with is how to measure this affect between the two, and how to apply the measure to the statistics. Any ideas on that?

Many thanks.
 
  • #4
Hey iambasil and welcome to the forums.

The first thing you will have to do is to come up with a model for your regression.

If you have correlations between observations in time you will need to consider a longitudinal form of analyses.

Given that you are measuring probabilities, you will probably have to use some form of logistic regression involving a generalized linear model.

Typically in simple linear models, your estimator at given point for your independent variables (i.e. not the predictor) has a t-distribution or a Normal distribution.

But if you have conditional distributions, then the analyses will be a lot more complicated (a lot more)
 
  • #5
Thanks very much for your response chiro.

I'll be honest - a lot of what you wrote is a bit beyond my understanding, even after looking up the terms you used.

I've attached some data and analysis I did as an example (zipped as was over 100k).

I modeled touchdowns of home and away teams based on poisson distributions on the means of all games - and then summed up the probabilities of the supremacies based on any outcome. This the closest I could get to matching the observed results, but the error is still beyond tolerable levels.

This method treats the home and away team touchdowns as being independent, which they aren't fully.

It would be great if you could suggest how to do the regressions/distributions based on what you see in the data? Forgive me for being unfamiliar and out of touch with methods.
 

Attachments

  • Example Analysis.xls.zip
    32.5 KB · Views: 243
  • #6
So are you trying to find a model to predict supremacy in terms of Away Touch-downs and Home Touch-downs?

If so you should look into regressions involving logistic types with Poisson models for the independent variables (i.e. the away and home touchdowns).

What kind of experience do you have with statistical computational tools?

You can download R for free at http://www.r-project.org

Take a look at this:

http://www.lisa.stat.vt.edu/sites/default/files/Poisson.and_.Logistic.Regression.pdf

(Scroll down to Poisson regression)
 
  • #7
Thanks Chiro,

My thought was that if I could get the touch downs right, I could then do it for field goals and any other scoring types similarly. By combining each of these and multiplying by points value, I could effectively evaluate the supremacy in total points.

A friend can help with MatLab, I can download R too. I used SAS, Minitab, Maple 13 years ago and don't remember much! I'm nifty on Excel and know VBA.

Thanks for the link. I'll spend some time to understand this. One thing to highlight though, although I used Poisson in the example (because it gave the best results), the actual best fit (and it's very good!) for total/team total touch downs (as opposed to supremacy) in this case was binomial.
 
  • #8
You'll probably have to look up what kind of link functions are supported in the GLM procedure for your particular package.

If they don't have direct support, then you will probably have to code a fitting procedure and use something like the Expectation Maximization algorithm (EM) or some other similar fitting algorithm to fit the data to some parametric distribution (which will be Poisson or Binomial).

I would be surprised if R didn't already have the functionality built in and I know from experience that SAS has a lot of built in options as well.
 
  • #9
Thanks Chiro, sounds like I have a lot of reading and learning to do!

I'll check back into this thread with an update (and maybe another query if ok/needed?)

Thanks very much!

Basil
 

What is combination probability of variables that are not independent?

Combination probability of variables that are not independent is a statistical method used to calculate the likelihood of two or more events occurring simultaneously, when the events are not independent of each other. This means that the probability of one event happening is affected by the occurrence of the other event.

How is combination probability of variables that are not independent calculated?

The calculation of combination probability of variables that are not independent involves multiplying the individual probabilities of each event by each other. The formula for combination probability is P(A and B) = P(A) x P(B|A), where P(A) and P(B) are the probabilities of events A and B, and P(B|A) is the conditional probability of event B given that event A has occurred.

What is the difference between combination probability and independent probability?

The main difference between combination probability and independent probability is that independent probability assumes that the occurrence of one event does not affect the occurrence of the other event, while combination probability takes into account the dependence between events. In other words, in independent probability, the probability of one event happening does not change regardless of whether the other event has occurred or not, while in combination probability, the probability of one event happening is influenced by the occurrence of the other event.

What are some real-life applications of combination probability of variables that are not independent?

Combination probability of variables that are not independent is widely used in various fields, including finance, insurance, and risk management. For example, it can be used to calculate the risk of default on a loan, where the probability of default is dependent on factors such as credit score and income. It is also used in medical research to determine the likelihood of a patient developing multiple diseases or conditions simultaneously.

What are some limitations of using combination probability of variables that are not independent?

One limitation of combination probability of variables that are not independent is that it assumes that the events being studied are the only factors affecting the outcome. In reality, there may be other confounding variables that could impact the results. Additionally, the calculation of combination probability can become complex and time-consuming when dealing with multiple dependent events. Finally, combination probability is based on past data and does not account for future changes or events that could affect the probabilities of the events being studied.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
968
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
770
  • General Math
Replies
15
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • General Math
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
2K
  • Calculus and Beyond Homework Help
Replies
1
Views
278
  • Set Theory, Logic, Probability, Statistics
5
Replies
147
Views
7K
Back
Top