How to analyse a sequence of vehicle states?

Click For Summary

Discussion Overview

The discussion revolves around analyzing a dataset of real-world vehicle trajectories classified into states based on vehicle parameters and locations. Participants explore methods for improving the analysis of these sequences of vehicle states, particularly in the context of trip detection and data validity.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes the current method of analyzing vehicle states using regular expressions, noting its inefficiency and the potential for misclassification in the data.
  • Another participant suggests that understanding the purpose of the analysis is crucial for recommending an appropriate approach.
  • It is proposed that data validation is important, including the possibility of discarding or correcting questionable data items.
  • Some participants challenge the use of regular expressions, suggesting that directly checking sequences of numbers may be simpler and faster.
  • A suggestion is made to implement a parser using the state pattern in an object-oriented language, although there is uncertainty about its effectiveness in this context.
  • Concerns are raised about the complexity of detecting trips due to variations in routes taken by drivers, which may complicate the analysis of vehicle state sequences.
  • Discussion includes the potential inefficiency of certain programming languages for this type of data processing, with mentions of Python and Perl as examples.

Areas of Agreement / Disagreement

Participants express differing opinions on the effectiveness of regular expressions versus direct sequence checking, and there is no consensus on the best approach to analyze the vehicle state sequences. The discussion remains unresolved regarding the optimal method for trip detection.

Contextual Notes

Participants note that the dataset may contain misclassifications and that real-world conditions can introduce variability in vehicle states, complicating the analysis. There are also mentions of the need for data validation and the challenges posed by large datasets.

serbring
Messages
267
Reaction score
2
Hi all,

I have to analyse a dataset containing real-world vehicle trajectories and in particular:
1. The trajectories were classified into states in the function of certain vehicle parameters and location (urban roads, country roads, etc.) and each state is characterised by an integer number (i.e., 1, 2, 3, etc.) permitting me to obtain also a signal of categorical variable called "state".
2. Portions of vehicle trajectories were grouped when a certain sequence of the states occurred which is equivalent to a trip starting from a parking lot, then travelling through an urban road, then to a highway stretch, etc. Thus, a sequence might start with a sequence of 1s, followed by a sequence containing 100 times 2s, followed by a sequence of 3, and so on. This task was carried out by converting the sequence of numbers into a string and by setting a proper regular expression. It is a quick and dirty approach and it took a lot of time to tune the parameters of the regular expression and it is far from being perfect.

This is because there might be misclassification in the vehicle states (step 1) and in real-world conditions, operations are not always carried out in the same way (meaning that there might be some extra states in between for example because the driver took the wrong road and the duration of this extra state may change in duration). So, I need to find a better method. However, I have no idea which approach I can adopt. Do you have any suggestion to give me?

Thanks!
 
Technology news on Phys.org
It might help to tell us what the purpose is. It's hard to suggest an approach when the desired result is unknown.
 
As part of data analysis, its good to review the data and correct fields or discard questionable data items.

Each dataset has its own rules for validity and you may need to establish them. As an example, you might discard rows that have missing data or if possible fill in the missing info with nominal values.

Say you had a dataset for trains, planes and automobiles, you could validate any speed fields by applying some speed range criteria to identify rows where speeds are too high or too low for the type of vehicle being recorded.
 
  • Like
Likes   Reactions: FactChecker
serbring said:
It is a quick and dirty approach and it took a lot of time
This statement seems to contradict itself

serbring said:
Do you have any suggestion to give me?
Don't use a regular expression.

Implement a parser in an object-oriented language using the state pattern.
 
Last edited:
serbring said:
This task was carried out by converting the sequence of numbers into a string and by setting a proper regular expression.
This seems odd. It would seem both easier and faster just to check the sequences of numbers directly.
 
I routinely used regular expressions to clean up, simplify, or filter inputs from imperfect sources before passing the inputs on to other algorithms. I considered it to be much easier and more reliable than the alternatives.
 
Last edited:
thanks for your answers. I will answer to all your comments.
FactChecker said:
It might help to tell us what the purpose is. It's hard to suggest an approach when the desired result is unknown.

jedishrfu said:
As part of data analysis, its good to review the data and correct fields or discard questionable data items.

Each dataset has its own rules for validity and you may need to establish them. As an example, you might discard rows that have missing data or if possible fill in the missing info with nominal values.

Say you had a dataset for trains, planes and automobiles, you could validate any speed fields by applying some speed range criteria to identify rows where speeds are too high or too low for the type of vehicle being recorded.
The data were checked for their validity (e.g., no logged vehicle position for example, because the GPS did not fix the position) but of course the misclassification may occur and this can never be solved when dealing with a large dataset of real-world data. More specifically on the task, I am doing trip analysis where a trip can be a sequence of vehicle states (i.e., the vehicle start from a point (i.e., the house of the owner but a different place may occur), then travelling on extra-urban road, then a highway, then urban road, and finally back along to the same path. However, the driver may choose to travel through a different road to travel back home or he desided a detour and so on. This change may lead to different sequences of vehicle states complicating the detection of trips. Once trips are detect, I will calculate features of trips and I will analyse them. Hopefully, it is clearer now.


pbuk said:
This statement seems to contradict itself


Don't use a regular expression.

Implement a parser in an object-oriented language using the state pattern.
As reported by @FactChecker, for a quick and flexible pattern, a regular expression is a very quick approach even if it is not the best in terms of efficiency. However, when considering a large collection of real-world data, things get more complicated and therefore, now, I am searching a more advanced approach. I did not know state pattern and it might be interesting. I will dig into it but, as far as I understood, it is mostly a convenient way of using multiple if-statements so I am not fully sure if it might help in this case.

PeterDonis said:
This seems odd. It would seem both easier and faster just to check the sequences of numbers directly.
If you have any specific method to suggest me it might be great. Thanks.
 
serbring said:
More specifically on the task, I am doing trip analysis where a trip can be a sequence of vehicle states (i.e., the vehicle start from a point (i.e., the house of the owner but a different place may occur), then travelling on extra-urban road, then a highway, then urban road, and finally back along to the same path. However, the driver may choose to travel through a different road to travel back home or he desided a detour and so on. This change may lead to different sequences of vehicle states complicating the detection of trips. Once trips are detect, I will calculate features of trips and I will analyse them. Hopefully, it is clearer now.
Are you saying that you are trying to detect round trips by the pattern of speeds, without having position data?
serbring said:
As reported by @FactChecker, for a quick and flexible pattern, a regular expression is a very quick approach even if it is not the best in terms of efficiency.
If the application fits, I think it would be hard to beat the efficiency of the built-in regular expressions. You don't say what language you are using. Python is very popular now, but it can be astonishingly slow. I used Perl a lot for such tasks and was not bothered by any lack of speed. Your job and amount of data may just require long runs. If the execution time is very long (several hours or days), you should look for ways to periodically save things so that you can monitor progress and restart the program where it left off. Things like power "glitches", unplanned system resets, unexpected data inputs that are being handled wrong, etc. can force you to restart the program from the beginning or at some intermediate stage.
 

Similar threads

  • · Replies 24 ·
Replies
24
Views
6K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 62 ·
3
Replies
62
Views
11K
  • · Replies 13 ·
Replies
13
Views
4K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 37 ·
2
Replies
37
Views
10K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 85 ·
3
Replies
85
Views
34K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 8 ·
Replies
8
Views
3K