# A Correlation coefficient among trends

Tags:
1. Jan 21, 2017

### serbring

Hi all,

in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for each vehicle, I have the following plots:

https://s29.postimg.org/tnyfcdron/untitled.jpg

I have around 300 repetitions of these. I need to discover specific driving styles in the data I have. For example, drivers who use less some gears, run the engine at higher average torque because they use these gears only on accelerations. Doing that means to evaluate if any relationship/correlation among the shapes of the curves occurs. With scalars, I could compute the Pearson's correlation coefficient to evaluate the degree of linear correlation. But, is there any similar which I may use with trends?

2. Jan 22, 2017

### Stephen Tashi

What aspects are being exactly replicated ? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same. Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.

3. Jan 23, 2017

### serbring

the same vehicle with different engines are involved and for this reason, data are reported in percentage of the maximum engine torque. There is no difference in the engine speed range among the different engines. The route was not exactly the same but all of them are equivalent. Let's say all of them are mixed driving conditions, so it was a mixture of urban and urban and extra-urban driving mixture. Nevertheless, the gear distribution is rather different among the different drivers because each driver mostly used two/three gears (e.g. gears in the range from the 9th up to the 11th) but another driver may use a lower gear (so the range might be from the 8th up to the 10th) or an upper gear (so the range might be from the 10th up to the 12th). For this reason, I'm thinking to correlate, torque, engine speed and gear distribution together. I'm not considering the vehicle speed, because it is dependent by the engine speed and the gear ratio.

I got it. Is there any way to have something similar for n different distributions?

[/quote]
What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?

4. Jan 23, 2017

### Stephen Tashi

Is your goal to do hypothesis testing or to do estimation?

A "hypothesis test" involves some yes-or-no question like: "Do all the drivers have the same probability distribution for selecting torques when they drive the same course? - yes-or-no ?" or "Does driver_A use the same probability distribution for selecting torques when he drives the same course twice? yes-or no?"

An "estimation" involves estimating the parameters of a probability distribution or a statistical model. For example, if we assume each driver selects torques from a probability distribution of a certain type, we could try to estimate the parameters ( e.g. mean, standard deviation) of the distribution that apply to each particular driver. This approach takes for granted that distributions for different drivers may have different parameters.

Applying statistics is subjective, in spite of the impressive terminology (e.g. "significance", "confidence") that it uses. Typical textbook problems involving hypothesis testing compare two distributions. (i.e. testing the hypothesis that distribution_1 is the same as distribution_2). As far as I know, there is no "standard" way to solve problems that test a grand hypothesis like "All 100 distributions are the same distribution". People can tackle such a hypothesis by first doing pairwise hypothesis tests and using the results to group the distributions into pairs that test to be the same. Then, assuming the pairs that test the same are indeed they compare two pairs against other, etc.

Very dignified statistical analyses will do the hypothesis testing work first and then apply estimation to groups of distributions that are judged to be identical by using the combined data from the group to do the estimation.

Your data doesn't resemble distributions that have such orderly shapes.

I mean something more complicated. I don't claim to know much about the specifics of your problem, but I'll speculate.

A simple model is that the goal of a driver is to attain a certain speed at certain times on the course. The desired speed may be motivated by complicated sub-goals (e.g. Go fast, don't wreck, enjoy the scenery, arrive exactly at 3 PM, arrive before 3 PM, etc).

To further simplify the model, we could assume the desired speed is a function only of the location of the car on the course. (e.g. We could neglect the difference between a driver reaching mile post 2 and knowing he was "behind schedule" and the same driver reaching mile post 2 and knowing he was "making good time").

Part of a drivers "style" might consist of things that could be classified as "skill". (e.g. reaction time to change gears, once he decides to change gears.) To further simplify the model, let's assume all drivers have about the same reaction times.

The model is not a specific mathematical model yet, but the conceptual form of the model says that drivers don't randomly select a torque from a probability distribution. And they don't implement a "continuous time Markov process" where each level of torque has some probability distribution of times for changing to another level of torque.

This model suggests that the way to compare drivers is to compare how they select speeds at the same places on a course if you have such data.

There can be simpler models and more complicated models. What kind of conceptual model can be invented to justify comparing frequency distributions of times-of-use of torques? (I'm assuming the labels indicating "% time" on your graphs refer to data like "183 seconds out of a total of 1236 seconds" - i.e. time as measured by the clock as opposed to "number of times a particular gear was used out of the total number of gear uses").

Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted