Probability distribution of horses

Click For Summary

Discussion Overview

The discussion revolves around the probability distribution of horses in a race, specifically focusing on calculating the probability of each horse winning based on their uncertain race times, which are modeled using Gaussian distributions. Participants explore various mathematical approaches, including Monte Carlo simulations and functional approximations, to derive the winner's probability density function (pdf).

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant presents the race times for six horses and notes the uncertainty in these times, suggesting the use of Gaussian distributions to model them.
  • Another participant proposes defining a random variable for the minimum time among the horses and discusses the conditions under which a specific horse wins.
  • There are inquiries about the distribution of the minimum time and how to derive it from the individual horse distributions.
  • Some participants suggest numerical integration methods to calculate probabilities and express them as products of integrals involving Gaussian cumulative distribution functions (CDFs).
  • Concerns are raised about the correctness of certain formulas and whether they integrate to 1, indicating uncertainty in the proposed mathematical expressions.
  • Participants discuss the implications of the problem in other contexts, such as component failure statistics, and the challenges of fitting experimental data to the model.

Areas of Agreement / Disagreement

There is no consensus on the correctness of the formulas presented, and multiple competing views remain regarding the proper approach to calculating the probabilities. Participants express uncertainty about the integration and the assumptions underlying their models.

Contextual Notes

Participants highlight limitations in their understanding of the standard deviations of the horses' times, which affects their ability to derive accurate probability distributions. There are also unresolved questions about the appropriateness of certain mathematical techniques and the nature of the distributions being used.

cosmiccase
Messages
8
Reaction score
0
A horse race is going to take place with six runners.
The race is over 5 furlongs (1000 meters) and for each of the six contestants it is known that their probable times at this distance are:

horse 1: 57.00 sec
horse 2: 57.20 sec
horse 3: 57.35 sec
horse 4: 57.80 sec
horse 5: 58.10 sec
horse 6: 59.50 sec

But, as is always the case in horse races, these times are uncertain, so the outcome is unknown.
In fact each of the above times is accurate by plus or minus 0.50 seconds, i.e. for horse "1" there is a Gaussian distribution with mean 57.00 and standard deviation 0.5, for horse "2" there is a Gaussian with mean 57.20 and st. dev. 0.5 and so on.

What is the probability for each horse to win the race ?

There is an easy (but a little slow) answer that can be derived by Monte Carlo simulation using random numbers, but it's not what I 'm asking for.
Does anyone know a functional approximation for the winner's pdf ?
 
Physics news on Phys.org
Relabel horses A ... F.

Define t(min\A) = min{t(B), ..., t(F)}. t(min\A) is a random variable whose distribution can be obtained from the distributions of t(B) ... t(F).

"A" wins if t(A) < t(min\A) or 0 < t(min\A) - t(A). Let d(A) = t(min\A) - t(A). The distribution of d(A) can be obtained from those of t(min\A) and t(A). Let Fd(A) be the CDF of d(A). The probability that A would win is 1 - Fd(A)(0).
 
What is the distribution of t(min/A) then ?

Simulation gives the following results for the numbers in my example:

horse 1: 0.47
horse 2: 0.29
horse 3: 0.19
horse 4: 0.04
horse 5: 0.01
horse 6: 0.00

(plus-minus 0.01)
 
Let [itex]\mathbb S[/itex] = {A, ..., F}.

Prob{t(min\A) < x} = Prob{min{t(B), ..., t(F)} < x} = Prob{not all {t(B), ..., t(F)} > x} = 1 - Prob{all {t(B), ..., t(F)} > x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] where [itex]\Phi_k[/itex] is the Gaussian CDF for t(k).
 
Just integrate using numerical methods you will get
0.4702
0.2858
0.1888
0.0427
0.0125
1.8568*1e-6
 
Can you write this down as a product of integrals ?
Is there a functional approximation when the means are Mi and the stds Si ?
 
cosmiccase said:
Can you write this down as a product of integrals ?
Is there a functional approximation when the means are Mi and the stds Si ?
For the answer I posted, you certainly can write it as a product of integrals because each [itex]\Phi_k(x)[/itex] is an integral. There may be a functional approximation but I don't know what it would look like. If you simulate it you should be able to fit some polynomial function using regression analysis.
 
integrals

EnumaElish imports some good looking math symbols.
Can one get those from the font menu ?

Anyway is it

P(A) = integral from 0 to infinity of {erf(t,Mb,Sb) x erf(t,Mc,Sc) ... x erf(t,Mf,Sf)} ?
 
I myself like 'em symbols; aren't they cool? All one has to do is to click on a symbol or formula and read the TeX instructions. That's how I started using them in the first place.

You had asked the distribution of t(min\A}. It is Prob{t(min\A) < x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] and you have to substitute the CDF formula or integral for each [itex]\Phi_k(x)[/itex] for [itex]k \in \mathbb S[/itex]\A = {B, C, D, E, F}.
 
Last edited:
  • #10
PROB(A) = [tex]\int[/tex] dt . fa(t) [tex]\prod[/tex](1-erf([tex]\mu[/tex],[tex]\sigma[/tex],t))

That does n't look great but I think this is it.
The integral is from 0 to infinity.
fa is the Gaussian of "horse A".
erf are the cdfs of the Gaussians of B-C-D-E-F.

Simlilarly for the others.
It's a hell of an integral though.
How do you tackle it using numerical methods ?
I 'm surprised there is no link to it -none I can find that is.
This problem could be encountered in component failure statistics also, could n't it ?
If you have n components of unequal age or durability, you probably want to know which ones need more frequent attention and how much.
 
Last edited:
  • #11
You are saying that Prob(A wins) = Prob(Others do not cross the finish line before time t) weighted by A's frequency and integrated over t.

But, where is the probability expression that "A crosses the finish line at or before time t"?

Non-substantive observations:
1. Isn't it customary to write the dt after the integrand?
2. Have you considered a distribution defined over t > 0 only, such as the "F" distribution?
3. It can be tackled in Mathematica or similar symbolic-numeric software.
 
  • #12
I 'm following your reasoning.
"A" can cross the finish line at any time 0 to infinity.
Practically, with a mean of 57.00 secs it can be 55.00 seconds (a record performence) to infinite (goes lame - stops !). In our ideal model where lameness-jockey accidents do not exist, for all intends and purposes the pdfs expire 2-3 seconds on either side of the mean.

The runners B-C-D-E-F are finishing at times <t in the integral and are factored out.
Did n't I do it right ?

Re. my use of tex you seem to do better as you have a description as well underneath your product symbol.

What is the "F" distribution ?

The true nature of this problem is that I have the mean values but I don't know what the sigmas might be (to any tolerable degree of accuracy).
So I want to fit them experimentally, if I have an approximate formula first.
 
  • #13
I guess your formula is right.

See F-distribution, http://planetmath.org/encyclopedia/FDistribution.html .
 
Last edited by a moderator:
  • #14
  • #15
cosmiccase said:
I 'm not sure the formula is right.
If it's three random variables are you sure that:

f1.(1-erf2).(1-erf3)+ f2.(1-erf1).(1-erf3)+ f3.(1-erf1).(1-erf2)

integrates to 1 ?
Nope, I am not sure. Neither am I sure that this is what your previous formula comes to. If I were you I would try to prove it one way or the other.
 

Similar threads

  • · Replies 9 ·
Replies
9
Views
5K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 6 ·
Replies
6
Views
5K
  • · Replies 6 ·
Replies
6
Views
12K
  • · Replies 3 ·
Replies
3
Views
2K