Correlation coefficient among trends

Click For Summary
SUMMARY

This discussion focuses on analyzing driving styles through the correlation of engine torque, speed, and gear usage across multiple vehicles. The participants suggest using the Pearson's correlation coefficient for linear relationships and the two-sample Kolmogorov-Smirnov test to compare frequency distributions of gear ratios. The conversation emphasizes the need for a robust statistical model to evaluate differences in driving behavior, considering factors like driver variability and driving conditions. The importance of hypothesis testing versus estimation in statistical analysis is also highlighted.

PREREQUISITES
  • Understanding of Pearson's correlation coefficient for linear relationships
  • Familiarity with the two-sample Kolmogorov-Smirnov test for distribution comparison
  • Knowledge of statistical modeling concepts, including hypothesis testing and estimation
  • Basic understanding of driving dynamics and engine performance metrics
NEXT STEPS
  • Research advanced statistical models for analyzing multiple distributions
  • Learn about the application of the Kolmogorov-Smirnov test in real-world scenarios
  • Explore methods for estimating parameters of probability distributions in driving data
  • Investigate the impact of driving conditions on gear usage and engine performance
USEFUL FOR

Data analysts, automotive engineers, and researchers interested in vehicle performance and driver behavior analysis will benefit from this discussion.

serbring
Messages
267
Reaction score
2
Hi all,

in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for each vehicle, I have the following plots:

https://s29.postimg.org/tnyfcdron/untitled.jpg

I have around 300 repetitions of these. I need to discover specific driving styles in the data I have. For example, drivers who use less some gears, run the engine at higher average torque because they use these gears only on accelerations. Doing that means to evaluate if any relationship/correlation among the shapes of the curves occurs. With scalars, I could compute the Pearson's correlation coefficient to evaluate the degree of linear correlation. But, is there any similar which I may use with trends?
 
Physics news on Phys.org
serbring said:
I have around 300 repetitions of these.

What aspects are being exactly replicated ? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same. Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
 
Stephen Tashi said:
What aspects are being exactly replicated? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?
the same vehicle with different engines are involved and for this reason, data are reported in percentage of the maximum engine torque. There is no difference in the engine speed range among the different engines. The route was not exactly the same but all of them are equivalent. Let's say all of them are mixed driving conditions, so it was a mixture of urban and urban and extra-urban driving mixture. Nevertheless, the gear distribution is rather different among the different drivers because each driver mostly used two/three gears (e.g. gears in the range from the 9th up to the 11th) but another driver may use a lower gear (so the range might be from the 8th up to the 10th) or an upper gear (so the range might be from the 10th up to the 12th). For this reason, I'm thinking to correlate, torque, engine speed and gear distribution together. I'm not considering the vehicle speed, because it is dependent by the engine speed and the gear ratio.

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same.
I got it. Is there any way to have something similar for n different distributions?
Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
[/quote]
What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?
 
serbring said:
I got it. Is there any way to have something similar for n different distributions?
Is your goal to do hypothesis testing or to do estimation?

A "hypothesis test" involves some yes-or-no question like: "Do all the drivers have the same probability distribution for selecting torques when they drive the same course? - yes-or-no ?" or "Does driver_A use the same probability distribution for selecting torques when he drives the same course twice? yes-or no?"

An "estimation" involves estimating the parameters of a probability distribution or a statistical model. For example, if we assume each driver selects torques from a probability distribution of a certain type, we could try to estimate the parameters ( e.g. mean, standard deviation) of the distribution that apply to each particular driver. This approach takes for granted that distributions for different drivers may have different parameters.

Applying statistics is subjective, in spite of the impressive terminology (e.g. "significance", "confidence") that it uses. Typical textbook problems involving hypothesis testing compare two distributions. (i.e. testing the hypothesis that distribution_1 is the same as distribution_2). As far as I know, there is no "standard" way to solve problems that test a grand hypothesis like "All 100 distributions are the same distribution". People can tackle such a hypothesis by first doing pairwise hypothesis tests and using the results to group the distributions into pairs that test to be the same. Then, assuming the pairs that test the same are indeed they compare two pairs against other, etc.

Very dignified statistical analyses will do the hypothesis testing work first and then apply estimation to groups of distributions that are judged to be identical by using the combined data from the group to do the estimation.

What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?

Your data doesn't resemble distributions that have such orderly shapes.

I mean something more complicated. I don't claim to know much about the specifics of your problem, but I'll speculate.

A simple model is that the goal of a driver is to attain a certain speed at certain times on the course. The desired speed may be motivated by complicated sub-goals (e.g. Go fast, don't wreck, enjoy the scenery, arrive exactly at 3 PM, arrive before 3 PM, etc).

To further simplify the model, we could assume the desired speed is a function only of the location of the car on the course. (e.g. We could neglect the difference between a driver reaching mile post 2 and knowing he was "behind schedule" and the same driver reaching mile post 2 and knowing he was "making good time").

Part of a drivers "style" might consist of things that could be classified as "skill". (e.g. reaction time to change gears, once he decides to change gears.) To further simplify the model, let's assume all drivers have about the same reaction times.

The model is not a specific mathematical model yet, but the conceptual form of the model says that drivers don't randomly select a torque from a probability distribution. And they don't implement a "continuous time Markov process" where each level of torque has some probability distribution of times for changing to another level of torque.

This model suggests that the way to compare drivers is to compare how they select speeds at the same places on a course if you have such data.

There can be simpler models and more complicated models. What kind of conceptual model can be invented to justify comparing frequency distributions of times-of-use of torques? (I'm assuming the labels indicating "% time" on your graphs refer to data like "183 seconds out of a total of 1236 seconds" - i.e. time as measured by the clock as opposed to "number of times a particular gear was used out of the total number of gear uses").
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 3 ·
Replies
3
Views
12K
  • · Replies 4 ·
Replies
4
Views
575
Replies
2
Views
4K
Replies
10
Views
3K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K