Calculating Consecutive Journey Probability with Distance Data

  • Context: Undergrad 
  • Thread starter Thread starter bradyj7
  • Start date Start date
  • Tags Tags
    Probability
Click For Summary

Discussion Overview

The discussion centers around calculating the probability of consecutive journeys made by cars being of the same distance, specifically exploring how likely it is for two consecutive journeys to be within one mile of each other compared to being different distances. The context includes statistical analysis, Bayesian probability, and interpretations of frequency versus Bayesian frameworks.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • The original poster (OP) presents data showing that out of 14,325 journey pairs, 5,938 were within one mile of the same distance, leading to a calculation of approximately 0.41 for the probability of same distances.
  • Some participants question the interpretation of the probability and whether it can be used to assert that the next journey is four times more likely to be the same distance.
  • There is a discussion about the nature of trips and how they are paired, with some participants suggesting that the context of the journeys (e.g., work-related versus other trips) may influence the correlation.
  • One participant suggests that the relationship between consecutive journeys may not be straightforward and could depend on various factors, including the type of trips taken.
  • There is a debate over the validity of using a frequency approach versus a Bayesian approach to interpret the results, with some arguing that the Bayesian perspective allows for uncertainty in predictions.
  • Participants discuss the calculation of how much more likely one outcome is compared to another, with some proposing ratios based on the fractions of same versus different distances.

Areas of Agreement / Disagreement

Participants express differing views on the interpretation of the probability calculations and whether the proposed methods for estimating likelihood are valid. There is no consensus on how to definitively determine the likelihood of consecutive journeys being the same distance.

Contextual Notes

Participants highlight the need for more information regarding the assumptions behind the data and the potential dependencies between different journeys. The discussion also touches on the limitations of the current analysis and the implications of using different statistical frameworks.

Who May Find This Useful

This discussion may be useful for those interested in statistical analysis of travel data, Bayesian probability, and the interpretation of journey patterns in transportation studies.

bradyj7
Messages
117
Reaction score
0
Hi,

I have the distance for consecutive journeys made by cars, i.e. 2 journeys made 1 after another.

I'm looking to determine, how much more likely two consecutive journeys are going to the same distance (plus or minus a mile) than not the same distance.

Out of 14,325 pairs of journeys I found that 5,938 were same distance (plus or minus a mile).

5,938/14,325 = 0.41

If some said to me that their journey was 5 miles, could I say that their next journey if 4 times more likely to 5 miles (plus or minus a mile) than not to be.

Is that correct?

Thank you
 
Physics news on Phys.org
Let me answer your question by asking you another question. If you toss a fair coin 10 times and it comes up heads every time, what is the probability that it will come up heads the next time it is tossed?
 
So this is like a person who works 5 days a week taking 2 to 3 trips a day.

The work to / from trips are roughly the same mileage, whereas other trips to the store, doctor/dentist... or to somewhere else will have differing mileage.

For weekends the trips may vary in distance unless there's some planned activity like a soccer match for your kids.

You'd have to define what a trip is like turning on the car, driving somewhere and turning off the car or is a trip going someplace and then returning to where you started...
 
Last edited:
Yes, this a like a person who works 5 days a week taking 2 to 3 trips a day.

It is only week day data.

1 trip is a distance to a destination. The 2nd trip is from that destination to somewhere else.

I just would like to know, generally speaking can I infer from this that if a trip is x miles long is the next journey 4 times more likely x miles (plus or minus a mile) than not to be. Yes/No?

I'm trying to determine the hyperparameters for a prior distribution in a bayesian probability problem.

I'm creating a prior distribution of journey distances for a trip given the previous trip. So for example if the previous journey was 5 miles, one might believe that the next trip will be approximately 5 miles give or take a mile.

So judging by my prior belief I would assign a hyperparameter of 4 for 5 mile and 1 for say 20 miles. So what I am saying is that prior to observing any data, my prior belief is that given a trip is x miles long I believe that the next journey is 4 times more likely to be x miles (plus or minus a mile) than not to be.

I don't want to use a uniform prior.

Thanks
 
phinds said:
Let me answer your question by asking you another question. If you toss a fair coin 10 times and it comes up heads every time, what is the probability that it will come up heads the next time it is tossed?
phinds, I'm surprised at you! That's completely unfair. If you know the coin is fair then you know there's no causative correlation between consecutive results. In the OP context, it is entirely reasonable that there would be a relationship.
Ideally, of course, you would plug in an a priori joint distribution and see how it's affected by the data, but on the face of it the proposed means of estimation is reasonable.
If you presume the gathered data to be typical, the question can be phrased like this:
If I pick a datapoint at random the set, what is the probability that the next datapoint's value will be within one mile?
That said, it might be appropriate (given enough data) to see how the strength of correlation varies as the length of journey changes.
 
We should get some more information from the OP regarding any issues both demonstrated data and by expert judgement regarding dependencies between the different events.

If there is a relationship, then these things will not only provide ways to detect them but also to understand the assumptions being placed on the analysis to begin with.
 
I know this isn't a mathematical argument but more a common practice argument.

My thinking is in the case of a day worker there is a strong correlation between pairs of trips. However for say a trucker or a delivery guy a weak correlation or none at all.

Also for trips in a car there would be a minimum distance that someone would decide to drive vs walk and for longer distances the time would determine whether the trip is made. Most people have a commute time limit where they decide its too long to drive and that could be one hour to two hours.

So I would tend to use a bell curve to describe the possible trips as an assumption with the center being the trips to and from work.

how are the trips paired together for the correlation?
 
Hello,

Thanks for the replies.

The trips are paired given their order.

For example if the raw data was

2
3
4
3
4
2

Then the paired data is

23
43
42

I'm not interested in how the relationship varies with distance. At this stage I'm just looking for a rough starting poitnt for the prior hyperparameters.

Can I confirm, just very broadly speaking that given the above calculation resulting in 0.4, implies that given a trip is x miles long, the next journey is approximately 4 times more likely to be x miles than not to be.

Thanks
 
bradyj7 said:
given the above calculation resulting in 0.4, implies that given a trip is x miles long, the next journey is approximately 4 times more likely to be x miles than not to be.
I don't follow that calculation. Isn't it 40% same versus 60% different?
 
  • #10
Yes, that is my question, so 0.4 implies 40% same versus 60% different. I wasn't sure of that.

But how do you determine how much more likely it is? twice as likely, 3 times as likely?

Thanks
 
  • #11
Is it even possible to infer how much more likely it is?
 
  • #12
Is the calculation:

(Fraction of journeys that were the same) / (Fraction of journeys that were not the same)

i.e

5,938/14,325 = 0.414

8,387/14,325 = .585

Therefore

0.414 / .585 = 0.7 times more likely
 
  • #13
haruspex said:
phinds, I'm surprised at you! That's completely unfair. If you know the coin is fair then you know there's no causative correlation between consecutive results. In the OP context, it is entirely reasonable that there would be a relationship.
Ideally, of course, you would plug in an a priori joint distribution and see how it's affected by the data, but on the face of it the proposed means of estimation is reasonable.
If you presume the gathered data to be typical, the question can be phrased like this:
If I pick a datapoint at random the set, what is the probability that the next datapoint's value will be within one mile?
That said, it might be appropriate (given enough data) to see how the strength of correlation varies as the length of journey changes.

Yeah, my first thought after I posted that was DOH ! and I should have gone back and deleted it but got caught up in something else
 
  • #14
Regarding the OP: you're right from one point of view and not necessarily right from another :-) If your problem is viewed in the frequency framework, then OK. However, from the Bayesian viewpoint, it may be wrong.

It's the same thing like the dispute that the sun will rise tomorrow. According to frequentist statistics, it will with probability 1. The Bayesian probability however reserves some little probability it will not.
 
  • #15
okay, so if we take a frequency approach for now, is the above calculation that resulted in 0.7 correct?

Thanks
 
  • #16
I think so. You even don't need the above fractions, the ratio follows directly from 5938/8387. But be careful about its interpretation, especially if you want to talk about probability!
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 20 ·
Replies
20
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 14 ·
Replies
14
Views
2K
  • · Replies 10 ·
Replies
10
Views
5K
  • · Replies 15 ·
Replies
15
Views
2K
  • · Replies 76 ·
3
Replies
76
Views
7K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 7 ·
Replies
7
Views
4K
Replies
31
Views
7K