Combination probability of variables that are not independent

Click For Summary

Discussion Overview

The discussion revolves around creating a forecasting model for predicting the expected supremacy between two sports teams (specifically in US Football and rugby) based on two interdependent variables. Participants explore methods to account for the relationship between the teams' scoring outcomes and how to adjust probability forecasts accordingly.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant seeks to forecast the probability of various supremacy outcomes between two teams, noting that traditional independent models do not fit well due to the interdependence of team scores.
  • Another participant suggests that the relationship between team scores may involve conditional probabilities and mentions Bayes' theorem.
  • A different participant emphasizes the need for a regression model and mentions the potential use of longitudinal analysis due to correlations over time.
  • One contributor proposes using logistic regression with generalized linear models to handle the complexities of conditional distributions.
  • Another participant shares their experience with Poisson distributions for modeling touchdowns but acknowledges that their method treats team scores as independent, which is not entirely accurate.
  • There is a suggestion to explore logistic regression involving Poisson models for predicting supremacy based on touchdowns.
  • Participants discuss the use of statistical software tools like R and SAS for implementing these models, with one participant expressing a need to refresh their knowledge of statistical methods.
  • One participant mentions that while Poisson provided good results, a binomial model yielded an even better fit for total touchdowns.

Areas of Agreement / Disagreement

Participants express various viewpoints on the modeling approach, with no consensus reached on the best method to account for the interdependence of team scores. The discussion includes multiple competing ideas and remains unresolved regarding the optimal forecasting technique.

Contextual Notes

Participants highlight limitations in their current models, including the assumption of independence between team scores and the need for more sophisticated statistical methods to accurately reflect the relationship between the variables.

iambasil
Messages
14
Reaction score
0
Hello,

I'm hoping I might be able to get some help in creating a forecasting model (in sports) looking at 2 variables that are not independent of each other.

I'll take US Football (same applies to rugby) as an example. The specific forecast I'm interested in here is the expected supremacy between two teams at the end of a match (Team B points minus Team A points).

There's a fair bit out there others have done looking at how to forecast the most likely supremacy outcome (the 'line' which generally isn't just stats, but involves looking at prices set by the betting world).

However, what I'm most interested in is how to create a forecasted probability of supremacies that are different to the line. As an example, if it is assessed that the E(line) is -4.3, I want to work out the probability that the supremacy would actually be any of:
-10, -9, -8, ..., 0, 1, 2, ...9, 10 (etc)

There is obviously error in the line itself, which needs to be taken into account, so I looked at historic data as a guide. As you might expect, combining the expectation of team A's point haul with that of team B's based on the line but independent of each other does not return a good enough fit (not negatively skewed enough and kurtosis too high) - team A's point haul will generally have an affect on team B's (and vice versa) - they are not independent of each other.

Are you able to please share some ideas on how to adjust for the fact that team A and team B are related in calculating supremacy. Really appreciate your help!

Basil
 
Physics news on Phys.org
You are saying that team As haul is conditional on team B's ... so you are looking for something involving conditional probabilities and Beyes' theorem.
 
Thanks for responding.

Well, it's not entirely conditional, but it is affected by it to varying degrees.

So what I'm struggling with is how to measure this affect between the two, and how to apply the measure to the statistics. Any ideas on that?

Many thanks.
 
Hey iambasil and welcome to the forums.

The first thing you will have to do is to come up with a model for your regression.

If you have correlations between observations in time you will need to consider a longitudinal form of analyses.

Given that you are measuring probabilities, you will probably have to use some form of logistic regression involving a generalized linear model.

Typically in simple linear models, your estimator at given point for your independent variables (i.e. not the predictor) has a t-distribution or a Normal distribution.

But if you have conditional distributions, then the analyses will be a lot more complicated (a lot more)
 
Thanks very much for your response chiro.

I'll be honest - a lot of what you wrote is a bit beyond my understanding, even after looking up the terms you used.

I've attached some data and analysis I did as an example (zipped as was over 100k).

I modeled touchdowns of home and away teams based on poisson distributions on the means of all games - and then summed up the probabilities of the supremacies based on any outcome. This the closest I could get to matching the observed results, but the error is still beyond tolerable levels.

This method treats the home and away team touchdowns as being independent, which they aren't fully.

It would be great if you could suggest how to do the regressions/distributions based on what you see in the data? Forgive me for being unfamiliar and out of touch with methods.
 

Attachments

So are you trying to find a model to predict supremacy in terms of Away Touch-downs and Home Touch-downs?

If so you should look into regressions involving logistic types with Poisson models for the independent variables (i.e. the away and home touchdowns).

What kind of experience do you have with statistical computational tools?

You can download R for free at http://www.r-project.org

Take a look at this:

http://www.lisa.stat.vt.edu/sites/default/files/Poisson.and_.Logistic.Regression.pdf

(Scroll down to Poisson regression)
 
Thanks Chiro,

My thought was that if I could get the touch downs right, I could then do it for field goals and any other scoring types similarly. By combining each of these and multiplying by points value, I could effectively evaluate the supremacy in total points.

A friend can help with MatLab, I can download R too. I used SAS, Minitab, Maple 13 years ago and don't remember much! I'm nifty on Excel and know VBA.

Thanks for the link. I'll spend some time to understand this. One thing to highlight though, although I used Poisson in the example (because it gave the best results), the actual best fit (and it's very good!) for total/team total touch downs (as opposed to supremacy) in this case was binomial.
 
You'll probably have to look up what kind of link functions are supported in the GLM procedure for your particular package.

If they don't have direct support, then you will probably have to code a fitting procedure and use something like the Expectation Maximization algorithm (EM) or some other similar fitting algorithm to fit the data to some parametric distribution (which will be Poisson or Binomial).

I would be surprised if R didn't already have the functionality built in and I know from experience that SAS has a lot of built in options as well.
 
Thanks Chiro, sounds like I have a lot of reading and learning to do!

I'll check back into this thread with an update (and maybe another query if ok/needed?)

Thanks very much!

Basil
 

Similar threads

  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 8 ·
Replies
8
Views
1K
  • · Replies 7 ·
Replies
7
Views
2K
  • · Replies 4 ·
Replies
4
Views
5K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 2 ·
Replies
2
Views
3K