Correlation coefficient among trends

Click For Summary

Discussion Overview

The discussion revolves around analyzing driving styles based on engine torque, speed, and gear usage across different vehicles. Participants explore statistical methods to evaluate correlations among trends in the data collected from various drivers under mixed driving conditions.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes measuring engine torque and speed while driving, aiming to discover specific driving styles through correlation among the shapes of the curves.
  • Another participant suggests using the two-sample Kolmogorov-Smirnov test to compare distributions of gear usage, questioning the replication of driving conditions and driver variability.
  • A clarification is made that the same vehicle with different engines is involved, and data is reported as a percentage of maximum engine torque, with a focus on correlating torque, engine speed, and gear distribution.
  • There is a query about extending the Kolmogorov-Smirnov test to n different distributions, with a distinction made between hypothesis testing and estimation in statistical analysis.
  • Participants discuss the complexity of modeling driver behavior, suggesting that drivers may have specific goals influencing their torque selection rather than randomly selecting from a probability distribution.
  • One participant speculates on the conceptual model of driver behavior, proposing that comparisons could be made based on how drivers select speeds at the same locations on a course.

Areas of Agreement / Disagreement

Participants express varying opinions on the appropriate statistical methods to analyze the data and the nature of driver behavior. There is no consensus on a single model or method to apply, and multiple competing views remain regarding the analysis of driving styles.

Contextual Notes

Participants note the limitations of their analysis, including the dependence on the specific models used to represent driver behavior and the challenges in comparing multiple distributions simultaneously.

Who May Find This Useful

This discussion may be of interest to those studying statistical methods in behavioral analysis, automotive engineering, or data analysis in driving performance contexts.

serbring
Messages
267
Reaction score
2
Hi all,

in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for each vehicle, I have the following plots:

https://s29.postimg.org/tnyfcdron/untitled.jpg

I have around 300 repetitions of these. I need to discover specific driving styles in the data I have. For example, drivers who use less some gears, run the engine at higher average torque because they use these gears only on accelerations. Doing that means to evaluate if any relationship/correlation among the shapes of the curves occurs. With scalars, I could compute the Pearson's correlation coefficient to evaluate the degree of linear correlation. But, is there any similar which I may use with trends?
 
Physics news on Phys.org
serbring said:
I have around 300 repetitions of these.

What aspects are being exactly replicated ? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same. Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
 
Stephen Tashi said:
What aspects are being exactly replicated? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?
the same vehicle with different engines are involved and for this reason, data are reported in percentage of the maximum engine torque. There is no difference in the engine speed range among the different engines. The route was not exactly the same but all of them are equivalent. Let's say all of them are mixed driving conditions, so it was a mixture of urban and urban and extra-urban driving mixture. Nevertheless, the gear distribution is rather different among the different drivers because each driver mostly used two/three gears (e.g. gears in the range from the 9th up to the 11th) but another driver may use a lower gear (so the range might be from the 8th up to the 10th) or an upper gear (so the range might be from the 10th up to the 12th). For this reason, I'm thinking to correlate, torque, engine speed and gear distribution together. I'm not considering the vehicle speed, because it is dependent by the engine speed and the gear ratio.

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same.
I got it. Is there any way to have something similar for n different distributions?
Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
[/quote]
What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?
 
serbring said:
I got it. Is there any way to have something similar for n different distributions?
Is your goal to do hypothesis testing or to do estimation?

A "hypothesis test" involves some yes-or-no question like: "Do all the drivers have the same probability distribution for selecting torques when they drive the same course? - yes-or-no ?" or "Does driver_A use the same probability distribution for selecting torques when he drives the same course twice? yes-or no?"

An "estimation" involves estimating the parameters of a probability distribution or a statistical model. For example, if we assume each driver selects torques from a probability distribution of a certain type, we could try to estimate the parameters ( e.g. mean, standard deviation) of the distribution that apply to each particular driver. This approach takes for granted that distributions for different drivers may have different parameters.

Applying statistics is subjective, in spite of the impressive terminology (e.g. "significance", "confidence") that it uses. Typical textbook problems involving hypothesis testing compare two distributions. (i.e. testing the hypothesis that distribution_1 is the same as distribution_2). As far as I know, there is no "standard" way to solve problems that test a grand hypothesis like "All 100 distributions are the same distribution". People can tackle such a hypothesis by first doing pairwise hypothesis tests and using the results to group the distributions into pairs that test to be the same. Then, assuming the pairs that test the same are indeed they compare two pairs against other, etc.

Very dignified statistical analyses will do the hypothesis testing work first and then apply estimation to groups of distributions that are judged to be identical by using the combined data from the group to do the estimation.

What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?

Your data doesn't resemble distributions that have such orderly shapes.

I mean something more complicated. I don't claim to know much about the specifics of your problem, but I'll speculate.

A simple model is that the goal of a driver is to attain a certain speed at certain times on the course. The desired speed may be motivated by complicated sub-goals (e.g. Go fast, don't wreck, enjoy the scenery, arrive exactly at 3 PM, arrive before 3 PM, etc).

To further simplify the model, we could assume the desired speed is a function only of the location of the car on the course. (e.g. We could neglect the difference between a driver reaching mile post 2 and knowing he was "behind schedule" and the same driver reaching mile post 2 and knowing he was "making good time").

Part of a drivers "style" might consist of things that could be classified as "skill". (e.g. reaction time to change gears, once he decides to change gears.) To further simplify the model, let's assume all drivers have about the same reaction times.

The model is not a specific mathematical model yet, but the conceptual form of the model says that drivers don't randomly select a torque from a probability distribution. And they don't implement a "continuous time Markov process" where each level of torque has some probability distribution of times for changing to another level of torque.

This model suggests that the way to compare drivers is to compare how they select speeds at the same places on a course if you have such data.

There can be simpler models and more complicated models. What kind of conceptual model can be invented to justify comparing frequency distributions of times-of-use of torques? (I'm assuming the labels indicating "% time" on your graphs refer to data like "183 seconds out of a total of 1236 seconds" - i.e. time as measured by the clock as opposed to "number of times a particular gear was used out of the total number of gear uses").
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 3 ·
Replies
3
Views
12K
  • · Replies 4 ·
Replies
4
Views
6K
Replies
2
Views
4K
Replies
10
Views
4K
  • · Replies 17 ·
Replies
17
Views
3K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K
Replies
1
Views
2K
  • · Replies 6 ·
Replies
6
Views
2K