Correlation coefficient among trends

In summary: A more complicated model would add other variables to the desired speed and the model would try to incorporate these other variables into the driver's desired speed at each location. (e.g. a driver's desired speed may depend on the time of day, on the weather, on traffic conditions, on having consumed alcohol, on...). A driver's style might then be thought of as consisting of a variety of preferences and skills, some of which are activated under some conditions and inactive under other conditions.In summary, the conversation discusses the process of analyzing data collected from several vehicles while driving at
  • #1
serbring
271
2
Hi all,

in several vehicles, I measured the engine torque and speed and the engaged gear while it was driven for around 100km/h. I computed the average engine speed and torque of all the times the vehicle was run with each gear and also I computed the relative frequency of the gear used. So for each vehicle, I have the following plots:

https://s29.postimg.org/tnyfcdron/untitled.jpg

I have around 300 repetitions of these. I need to discover specific driving styles in the data I have. For example, drivers who use less some gears, run the engine at higher average torque because they use these gears only on accelerations. Doing that means to evaluate if any relationship/correlation among the shapes of the curves occurs. With scalars, I could compute the Pearson's correlation coefficient to evaluate the degree of linear correlation. But, is there any similar which I may use with trends?
 
Physics news on Phys.org
  • #2
serbring said:
I have around 300 repetitions of these.

What aspects are being exactly replicated ? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same. Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
 
  • #3
Stephen Tashi said:
What aspects are being exactly replicated? I gather the drivers can be different. Are different vehicles involved? Are all drivers driving the same course?
the same vehicle with different engines are involved and for this reason, data are reported in percentage of the maximum engine torque. There is no difference in the engine speed range among the different engines. The route was not exactly the same but all of them are equivalent. Let's say all of them are mixed driving conditions, so it was a mixture of urban and urban and extra-urban driving mixture. Nevertheless, the gear distribution is rather different among the different drivers because each driver mostly used two/three gears (e.g. gears in the range from the 9th up to the 11th) but another driver may use a lower gear (so the range might be from the 8th up to the 10th) or an upper gear (so the range might be from the 10th up to the 12th). For this reason, I'm thinking to correlate, torque, engine speed and gear distribution together. I'm not considering the vehicle speed, because it is dependent by the engine speed and the gear ratio.

If the graph of percent-time vs gear ratio can be considered as frequency distribution (i.e. an "empirical" distribution of gear-rato), you could use the "two-sample Kolmogorov-Smirnov test" to test whether two such distributions are the same.
I got it. Is there any way to have something similar for n different distributions?
Whether this test indicates a difference in driving style, depends on the model used to represent the behavior of the drivers.
[/quote]
What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?
 
  • #4
serbring said:
I got it. Is there any way to have something similar for n different distributions?
Is your goal to do hypothesis testing or to do estimation?

A "hypothesis test" involves some yes-or-no question like: "Do all the drivers have the same probability distribution for selecting torques when they drive the same course? - yes-or-no ?" or "Does driver_A use the same probability distribution for selecting torques when he drives the same course twice? yes-or no?"

An "estimation" involves estimating the parameters of a probability distribution or a statistical model. For example, if we assume each driver selects torques from a probability distribution of a certain type, we could try to estimate the parameters ( e.g. mean, standard deviation) of the distribution that apply to each particular driver. This approach takes for granted that distributions for different drivers may have different parameters.

Applying statistics is subjective, in spite of the impressive terminology (e.g. "significance", "confidence") that it uses. Typical textbook problems involving hypothesis testing compare two distributions. (i.e. testing the hypothesis that distribution_1 is the same as distribution_2). As far as I know, there is no "standard" way to solve problems that test a grand hypothesis like "All 100 distributions are the same distribution". People can tackle such a hypothesis by first doing pairwise hypothesis tests and using the results to group the distributions into pairs that test to be the same. Then, assuming the pairs that test the same are indeed they compare two pairs against other, etc.

Very dignified statistical analyses will do the hypothesis testing work first and then apply estimation to groups of distributions that are judged to be identical by using the combined data from the group to do the estimation.

What do you mean for model? Weibull, Lognormal and so on or a kind of physical model with random parameters?

Your data doesn't resemble distributions that have such orderly shapes.

I mean something more complicated. I don't claim to know much about the specifics of your problem, but I'll speculate.

A simple model is that the goal of a driver is to attain a certain speed at certain times on the course. The desired speed may be motivated by complicated sub-goals (e.g. Go fast, don't wreck, enjoy the scenery, arrive exactly at 3 PM, arrive before 3 PM, etc).

To further simplify the model, we could assume the desired speed is a function only of the location of the car on the course. (e.g. We could neglect the difference between a driver reaching mile post 2 and knowing he was "behind schedule" and the same driver reaching mile post 2 and knowing he was "making good time").

Part of a drivers "style" might consist of things that could be classified as "skill". (e.g. reaction time to change gears, once he decides to change gears.) To further simplify the model, let's assume all drivers have about the same reaction times.

The model is not a specific mathematical model yet, but the conceptual form of the model says that drivers don't randomly select a torque from a probability distribution. And they don't implement a "continuous time Markov process" where each level of torque has some probability distribution of times for changing to another level of torque.

This model suggests that the way to compare drivers is to compare how they select speeds at the same places on a course if you have such data.

There can be simpler models and more complicated models. What kind of conceptual model can be invented to justify comparing frequency distributions of times-of-use of torques? (I'm assuming the labels indicating "% time" on your graphs refer to data like "183 seconds out of a total of 1236 seconds" - i.e. time as measured by the clock as opposed to "number of times a particular gear was used out of the total number of gear uses").
 

FAQ: Correlation coefficient among trends

What is correlation coefficient among trends?

Correlation coefficient among trends is a statistical measure that quantifies the relationship between two variables. It measures the strength and direction of the relationship between two trends.

How is correlation coefficient among trends calculated?

Correlation coefficient among trends is calculated by dividing the covariance of the two variables by the product of their standard deviations. This results in a value between -1 and 1, where a value of -1 represents a perfect negative correlation, 0 represents no correlation, and 1 represents a perfect positive correlation.

What does a high correlation coefficient among trends indicate?

A high correlation coefficient among trends indicates a strong positive relationship between the two variables. This means that as one trend increases, the other also tends to increase. A value close to -1 indicates a strong negative relationship, where one trend decreases as the other increases.

Can correlation coefficient among trends be used to determine causation?

No, correlation coefficient among trends only measures the strength and direction of the relationship between two variables. It cannot determine causation, which requires additional research and analysis.

How can correlation coefficient among trends be useful?

Correlation coefficient among trends can be useful in identifying patterns and relationships between variables, as well as in making predictions and informing decision making. It is commonly used in scientific research, data analysis, and market research.

Back
Top