Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probability distribution of horses

  1. Sep 1, 2005 #1
    A horse race is going to take place with six runners.
    The race is over 5 furlongs (1000 meters) and for each of the six contestants it is known that their probable times at this distance are:

    horse 1: 57.00 sec
    horse 2: 57.20 sec
    horse 3: 57.35 sec
    horse 4: 57.80 sec
    horse 5: 58.10 sec
    horse 6: 59.50 sec

    But, as is always the case in horse races, these times are uncertain, so the outcome is unknown.
    In fact each of the above times is accurate by plus or minus 0.50 seconds, i.e. for horse "1" there is a Gaussian distribution with mean 57.00 and standard deviation 0.5, for horse "2" there is a Gaussian with mean 57.20 and st. dev. 0.5 and so on.

    What is the probability for each horse to win the race ?

    There is an easy (but a little slow) answer that can be derived by Monte Carlo simulation using random numbers, but it's not what I 'm asking for.
    Does anyone know a functional approximation for the winner's pdf ?
  2. jcsd
  3. Sep 1, 2005 #2


    User Avatar
    Science Advisor
    Homework Helper

    Relabel horses A ... F.

    Define t(min\A) = min{t(B), ..., t(F)}. t(min\A) is a random variable whose distribution can be obtained from the distributions of t(B) ... t(F).

    "A" wins if t(A) < t(min\A) or 0 < t(min\A) - t(A). Let d(A) = t(min\A) - t(A). The distribution of d(A) can be obtained from those of t(min\A) and t(A). Let Fd(A) be the CDF of d(A). The probability that A would win is 1 - Fd(A)(0).
  4. Sep 1, 2005 #3
    What is the distribution of t(min/A) then ?

    Simulation gives the following results for the numbers in my example:

    horse 1: 0.47
    horse 2: 0.29
    horse 3: 0.19
    horse 4: 0.04
    horse 5: 0.01
    horse 6: 0.00

    (plus-minus 0.01)
  5. Sep 1, 2005 #4


    User Avatar
    Science Advisor
    Homework Helper

    Let [itex]\mathbb S[/itex] = {A, ..., F}.

    Prob{t(min\A) < x} = Prob{min{t(B), ..., t(F)} < x} = Prob{not all {t(B), ..., t(F)} > x} = 1 - Prob{all {t(B), ..., t(F)} > x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] where [itex]\Phi_k[/itex] is the Gaussian CDF for t(k).
  6. Sep 2, 2005 #5
    Just integrate using numerical methods you will get
  7. Sep 2, 2005 #6
    Can you write this down as a product of integrals ?
    Is there a functional approximation when the means are Mi and the stds Si ?
  8. Sep 2, 2005 #7


    User Avatar
    Science Advisor
    Homework Helper

    For the answer I posted, you certainly can write it as a product of integrals because each [itex]\Phi_k(x)[/itex] is an integral. There may be a functional approximation but I don't know what it would look like. If you simulate it you should be able to fit some polynomial function using regression analysis.
  9. Sep 2, 2005 #8

    EnumaElish imports some good looking math symbols.
    Can one get those from the font menu ?

    Anyway is it

    P(A) = integral from 0 to infinity of {erf(t,Mb,Sb) x erf(t,Mc,Sc) ... x erf(t,Mf,Sf)} ?
  10. Sep 2, 2005 #9


    User Avatar
    Science Advisor
    Homework Helper

    I myself like 'em symbols; aren't they cool? All one has to do is to click on a symbol or formula and read the TeX instructions. That's how I started using them in the first place.

    You had asked the distribution of t(min\A}. It is Prob{t(min\A) < x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] and you have to substitute the CDF formula or integral for each [itex]\Phi_k(x)[/itex] for [itex]k \in \mathbb S[/itex]\A = {B, C, D, E, F}.
    Last edited: Sep 2, 2005
  11. Sep 3, 2005 #10
    PROB(A) = [tex]\int[/tex] dt . fa(t) [tex]\prod[/tex](1-erf([tex]\mu[/tex],[tex]\sigma[/tex],t))

    That does n't look great but I think this is it.
    The integral is from 0 to infinity.
    fa is the Gaussian of "horse A".
    erf are the cdfs of the Gaussians of B-C-D-E-F.

    Simlilarly for the others.
    It's a hell of an integral though.
    How do you tackle it using numerical methods ?
    I 'm surprised there is no link to it -none I can find that is.
    This problem could be encountered in component failure statistics also, could n't it ?
    If you have n components of unequal age or durability, you probably want to know which ones need more frequent attention and how much.
    Last edited: Sep 3, 2005
  12. Sep 3, 2005 #11


    User Avatar
    Science Advisor
    Homework Helper

    You are saying that Prob(A wins) = Prob(Others do not cross the finish line before time t) weighted by A's frequency and integrated over t.

    But, where is the probability expression that "A crosses the finish line at or before time t"?

    Non-substantive observations:
    1. Isn't it customary to write the dt after the integrand?
    2. Have you considered a distribution defined over t > 0 only, such as the "F" distribution?
    3. It can be tackled in Mathematica or similar symbolic-numeric software.
  13. Sep 3, 2005 #12
    I 'm following your reasoning.
    "A" can cross the finish line at any time 0 to infinity.
    Practically, with a mean of 57.00 secs it can be 55.00 seconds (a record performence) to infinite (goes lame - stops !). In our ideal model where lameness-jockey accidents do not exist, for all intends and purposes the pdfs expire 2-3 seconds on either side of the mean.

    The runners B-C-D-E-F are finishing at times <t in the integral and are factored out.
    Did n't I do it right ?

    Re. my use of tex you seem to do better as you have a description as well underneath your product symbol.

    What is the "F" distribution ?

    The true nature of this problem is that I have the mean values but I don't know what the sigmas might be (to any tolerable degree of accuracy).
    So I want to fit them experimentally, if I have an approximate formula first.
  14. Sep 3, 2005 #13


    User Avatar
    Science Advisor
    Homework Helper

    I guess your formula is right.

    See F-distribution, http://planetmath.org/encyclopedia/FDistribution.html [Broken].
    Last edited by a moderator: May 2, 2017
  15. Sep 5, 2005 #14
  16. Sep 5, 2005 #15


    User Avatar
    Science Advisor
    Homework Helper

    Nope, I am not sure. Neither am I sure that this is what your previous formula comes to. If I were you I would try to prove it one way or the other.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook