# Simple conditional probability question

1. Jun 22, 2012

Hello,

I'm trying to work out a conditional probability.

I have hundreds of measurements of two variables (1) Start Time and (2) Journey time.

I've created a frequency table.

https://dl.dropbox.com/u/54057365/All/forum.JPG [Broken]

How can I work out the Journey time given a start time?

P(JT | ST) = P(JT n ST)/P(ST)

How would you work out these?

For example given 8am what would be the probable journey time?

John

Last edited by a moderator: May 6, 2017
2. Jun 22, 2012

### viraltux

Hi John,

Simply gather all the data you have for 8am an check its distribution, then you can calculate a confidence interval for the expected value with the data. That's it.

3. Jun 22, 2012

Hello,

I'm trying to understand how to work it out manually. Is this possible using the table?

The frequency table shows the number of occurrences for each variable pair in my data set. For example there were 5 journeys that started at 8am and lasted 5 minutes.

Is there a way to manually calculate the probably journey time given the start time?

For example 9am

P(JT | ST) = P(JT n ST)/P(ST)

P( JT | 9am) = ?

4. Jun 22, 2012

Hi Viraltux,

Just wondering if you had any further thoughts on this?

Regards

John

5. Jun 22, 2012

### viraltux

I think you want the expected value, not the probability.

To calculate the expected value of the journey at 8am using the table simply do this calculation:

(1*2 + 2*7 + 3*6 + 4*5 + 5*5 + ..... + 10*6 ) / ( 2 + 7 + 6 + 5 + 5 + ... + 6)

6. Jun 22, 2012

Hello viraltux,

Yes it is the expected value that I was looking to calculate, thank you.

I sort of understand the calculation.

Can I expand it a little.

Say I wanted to calculate the expected journey time given Journey Start Time and Distance travelled.

P(JT | ST, D)

I've expanded the frequency table with actual measurements.

For example,there were 25 journeys that started at 8am, were 3 miles long with a journey time of 5 minutes.

https://dl.dropbox.com/u/54057365/All/forum.1JPG.JPG [Broken]

So to calculate the expected journey time given that the start time is 8am and the distance is 3 miles would the calculation be:

(3*3 +3*25+...+3*2) / (3+25+33+7+2) = 3 minutes

Really appreciate the help

Thanks

John

Last edited by a moderator: May 6, 2017
7. Jun 22, 2012

### viraltux

Hi John,

That is not right, you have 5 minutes time gaps, so you should do

(5*3 + 10*25 + 15*33 + 20*7 + 25*2 ) / ( 3 + 25 + 33 + 7 + 2)

In general the formula for the time expected value will be

(time1 * number1 + time2 * number 2 + ..... ) / (number1 + number2 + .... )

8. Jun 22, 2012

Hi viraltux,

Would it not be?

(0*3 + 5*25 + 10*33 + 15*7 + 20*2 ) / ( 3 + 25 + 33 + 7 + 2)

The journey times are rounded, so the zero refers to journeys that were less than 5 minutes.

Thanks

John

9. Jun 22, 2012

### viraltux

Well, then you better go for the middle point in each gap to minimize the error:

(5/2*3 + (5 + 5/2)*25 + (10+5/2)*33 + (15+5/2)*7 + (20+5/2)*2 ) / ( 3 + 25 + 33 + 7 + 2)

10. Jun 23, 2012

Regards

John

11. Jun 23, 2012

12. Jun 23, 2012

### viraltux

You're just right John.

13. Jun 26, 2012

Hi Viraltux,

Can I ask you what the difference is practically between a conditional expectation and conditional probability question?

Could you work this question out alternatively by dividing each cell by the total cell count? and summing up certain cells? - I am not entirely sure of the theory. Woudl that be possible or is that just wrong?

Thanks for the help

John

14. Jun 26, 2012

### haruspex

A couple of thoughts...
By only using the data for the specific distance you want the expected time for, you're not getting the most of the data. You could estimate a speed for each time of day, e.g.
Presumably the distances are also rounded somehow. If not rounded to nearest, need to make some adjustment there.

15. Jun 27, 2012

Hi Viraltux and Haruspex,

Just wondering if you had a change to read my question above?

Is this a conditional expectation problem or a conditional probability problem? What is the difference?

and the answer was that it was a conditional probability problem? and the respondent describes a different way to calculate it.

I'm confused now as to which method is correct? and the theory behind them.

Maybe both methods are correct?

Thanks

John

16. Jun 27, 2012

### chiro

This is an expectation problem since you are trying to find an average or a mean.

Also in the other thread in case you're wondering, to find the expectation I calculated the probability distribution first and then the mean (or average) using the probability. You don't have to do it this way and you can do it as viraltux has mentioned, but they are the same thing.

The only difference is that my method converted the frequency data to proper probabilities first whereas viraltux goes straight from frequency data to the mean (or average) in one step.

17. Jun 27, 2012

Hello Chiro,

Thank you for your reply - it must be late in Australia.

I understand the two problems are related now. You worked out the probabilities first and applied them to frequency data to get the expected value.

Thanks for clearing that up for me.

John

18. Jun 27, 2012

### viraltux

That's exactly so, I went straight to the point to minimize the coding in John's spreadsheet, nonetheless, John, the difficulties you have are quite basic so I would recommend you to follow an introductory course in Statistics so that you can make some exercises/questions and get a better understanding of the basics. I think it is a better strategy that random learning concepts here and there... If you do it you'll see how everything falls into place.

19. Jun 27, 2012

### haruspex

If you determine the conditional probabilities then you have everything, and conditional expectation is just one of many numbers you can derive from that. But in many problems, such as this, you only care about the expectation, and it can be much easier to go straight to that and forget about the detailed probabilities. In short, it's a shortcut.

20. Jul 24, 2012

Hi Guys,

Could I ask another question with regards to probability theory?

I have recorded some journey distances for cars. Here is an example. The total distance for the day is in the first column and the individual distances are in the adjacent columns. This table contains data for a total travel distance of 12 miles.

https://dl.dropbox.com/u/54057365/All/table.JPG [Broken]

I have a program that simulates a total daily travel distance and the number of journeys.

For example it could simulate a total travel distance of 12 miles and 3 journeys.

I am trying to determine the individual distances and order of of the individual journeys from my observed data.

For example it could be 3 miles + 3 miles + 6 miles = 12 miles

Could you advise me how you would do this? I believe it is a branch of probability called stick breaking construction.

I believe I would begin by determining x1 from the f(x1), and then x2 from f(x2|x1), and then x3 = D-x1-x2

I would be grateful if perhaps you could demonstrate a quick example from the above table?

Kind Regards

John

Last edited by a moderator: May 6, 2017