# Dealing with scattered data

Hi all,

I have few measurements of a tillage machine running two types of soil engaging tools in terms of geometry (named A and B, respectively). I have measured the draft for running the machine and the speed. My aim is to analyse how much is the difference of the two tools in terms of force or power.
Below you can see a sample plot of the test.

https://i.postimg.cc/0QZYBYgj/untitled.jpg

As you can see, the data are rather scattered and I having some difficulties in drawing some conclusions. We can speculate, that A tends to lead higher force than B, but I am not fully sure, because I cannot assume the right regression curve. Probably, there are few outliers, which I cannot identify without assuming the regression curve.
What could I do? Any suggestion is appreciated.

Cheers

anorlunda
Staff Emeritus
No matter what you do, you'll never be confident with the answer.

Assuming that the bad dates is because of rocks. A rock causes an impulse shock. Can you do time based filtering boogie plotting force versus speed? Reject impulses. That's just a guess.

Do you have the raw data including time stamps?

Last edited:
Yes I have the raw data. Here a sample: https://i.postimg.cc/MKXgRjGb/untitled2.jpg

I would exclude rocks, since the test was carried out on soft agricultural soil.

What do you mean for "time based filtering boogie"?

anorlunda
Staff Emeritus
What do you mean for "time based filtering boogie"? It means my keyboard auto complete failed me again.

I meant "time based filtering" no boogie. anorlunda
Staff Emeritus
It looks like all noise and no data, as if something very wrong happened with your experiment.

• Merlin3189
Tom.G
Gold Member
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.

For experiment verification, try it in a bed of dry sand or a well plowed field.  Cheers,
Tom

#### Attachments

FactChecker
Gold Member
You could do some statistical analysis, but I think that the data you show will not allow you do draw a conclusion at any confidence level about any difference in the average or in the trend. You may be able to draw a conclusion about the variation of the data, since the spread of the B data seems so much greater than the spread of the A data.

I am clarifying the data analysis

Here, I have a draft data https://i.postimg.cc/MKXgRjGb/untitled2.jpg. The measurements were carried out on steady conditions and the acquisition started and ended with a still tractor (that is the reason because the draft is almost 0kN at the beginning and the end of the acquisition. The plot reported is a repetition of a testing condition. The red part, is the part that I have selected for the analysis and from where, I have calculated the average value. I have a similar plot for the speed with its red part, from where I have calculated the average value as well.

Then, I have made a crossplot of the average value of draft with data and I get the following plot.

https://i.postimg.cc/CKfkCSfv/untitled.jpg

Now the data is positive, because I have plot the absolute value of the draft. Moreover, the plot is a bit less scattered, because the steady part was not perfectly selected in the previous script.

Last edited:
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.

For experiment verification, try it in a bed of dry sand or a well plowed field.
View attachment 231181 View attachment 231181

Cheers,
Tom

The fact the dract is constant/decreasing with the speed is wierd, since from the theory it should quadratically increase.

You're on right that the implement may float with the speed, but I thinkt, that is not the case. Here you can see the working depth with the speed.

https://i.postimg.cc/ZKMmkSCF/untitled3.jpg

As you can see, it is almost constant. The little difference could be caused by a little different setup in the measurements.

You could do some statistical analysis, but I think that the data you show will not allow you do draw a conclusion at any confidence level about any difference in the average or in the trend. You may be able to draw a conclusion about the variation of the data, since the spread of the B data seems so much greater than the spread of the A data.

How could I statistically evaluate the spread of the data in this case? Is there any approach you would use? With categorical data, the problems is my opinion easier to carry out

It looks like all noise and no data, as if something very wrong happened with your experiment.

Why? The draft is low at the beginning and the end of the acquisition because the tractor is still and the implement is out of the soil.

FactChecker
Gold Member
After doing linear regressions (separately for A and B), there remain residual errors (see http://www.statisticshowto.com/residual/ ), which are the differences between the actual data velues and the predictions from the regression equations. Most statistics regression programs will report the residual standard deviation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sigma.html ). The statistical properties of the residual standard error are mixed with the statistical properties of the regression line coefficients, since they are also estimates and not true values. But your case displays so much difference in sets A and B that I think it will be statistically significant.

PS. I am not experienced enough to give advice on how to handle data with sets of different varience magnitudes. Maybe someone more expert can help here. I have reached my knowledge limit.

Gold Member
I'm a little confused. One plot is force vs. time and appears well behaved. Another is force vs. speed and appears to be pure noise. Who are you converting from on to the other? Are you recording your speed separately and pulling out chunks of that F vs. t graph? Do you have multiple F vs. t at different speeds?

CWatters
Homework Helper
Gold Member
Have you checked the ground is uniform? Eg do several runs at a constant 1.2m/s and see if there is a lot of variability. If there is I think the answer is to make multiple runs at each speed to average down the variability.

Have you checked the ground is uniform? Eg do several runs at a constant 1.2m/s and see if there is a lot of variability. If there is I think the answer is to make multiple runs at each speed to average down the variability.
Size

I made 6 runs for each speed. Probably, they were not enough.

CWatters
Homework Helper
Gold Member
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.

Is it just me....The magnitude of the force appears to increase with speed as expected. Normally a negative sign indicates the direction.

• serbring
I'm a little confused. One plot is force vs. time and appears well behaved. Another is force vs. speed and appears to be pure noise. Who are you converting from on to the other? Are you recording your speed separately and pulling out chunks of that F vs. t graph? Do you have multiple F vs. t at different speeds?

I have the time based force and speed acquired in the same moment. Tests were carried at different target speeds , which were 0.8m/s, 1.2m/s and 1.6m/s. The plot that report, the force vs. speed is the plot of the average value of the time based force vs. the average value of the time based speed. The average values were calculated in the the red part of the time based force, that is the only part where the tool was into the ground and were the tractor reached the target speeds, which were 0.8m/s, 1.2m/s and 1.6m/s

After doing linear regressions (separately for A and B), there remain residual errors (see http://www.statisticshowto.com/residual/ ), which are the differences between the actual data velues and the predictions from the regression equations. Most statistics regression programs will report the residual standard deviation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sigma.html ). The statistical properties of the residual standard error are mixed with the statistical properties of the regression line coefficients, since they are also estimates and not true values. But your case displays so much difference in sets A and B that I think it will be statistically significant.

PS. I am not experienced enough to give advice on how to handle data with sets of different varience magnitudes. Maybe someone more expert can help here. I have reached my knowledge limit.

Since the theory reports that the trend of force with respect speed is quadratic, I could use that regression model, I am able to compute the residual and their standard devition, but less able to effectively analyse their results. Moreover, I remembered that I have carried out more repetition with B tool than with A tool. Coule this explain the larger variability for the B tool?

CWatters
Homework Helper
Gold Member
Is it possible the depth of the tool wasn't consistent? I'm surprised you are getting such large variation even after averaging 6 runs. What's the variation like run to run at same speed?

Gold Member
I have the time based force and speed acquired in the same moment. Tests were carried at different target speeds , which were 0.8m/s, 1.2m/s and 1.6m/s. The plot that report, the force vs. speed is the plot of the average value of the time based force vs. the average value of the time based speed. The average values were calculated in the the red part of the time based force, that is the only part where the tool was into the ground and were the tractor reached the target speeds, which were 0.8m/s, 1.2m/s and 1.6m/s

The force vs. time plots you showed appeared to be pretty good. It has some scatter, but is otherwise pretty good. It clearly has some meaningful frequency content as well.

Do your speed vs. time data look similarly good?

Based on the plots you showed with the depth, I have a strong suspicion that is what is causing your scatter. You could look at the correlation function between the depth and the force as well as between the speed and the force and try to prove which one is the dominant factor. You are basically treating this system like it has only one independent variable (speed), but it seems like you are ignoring a second independent variable (depth) that is poorly controlled, and the uncertainty from that variable is propagating into your dependent variable (force) and totally washing out the information you truly want, which is force vs. speed.

Make a 3D plot of force as a function of speed and depth. I'd be willing to bet the data look much more organized.

FactChecker
Gold Member
Since the theory reports that the trend of force with respect speed is quadratic, I could use that regression model, I am able to compute the residual and their standard deviation, but less able to effectively analyze their results. Moreover, I remembered that I have carried out more repetition with B tool than with A tool. Could this explain the larger variability for the B tool?
If the B tool was worn out after the early repetitions, that could certainly cause a lot of variation. Things like that are worth considering. You could put the data in chronological order to check if the spread changed over time. More repetitions on its own would not mathematically cause a greater variance.

anorlunda
Staff Emeritus
Size

I made 6 runs for each speed. Probably, they were not enough.

But those time plots only had 21 data points per curve. Is that the data from 6 runs, only 3 samples per run?

The force vs. time plots you showed appeared to be pretty good. It has some scatter, but is otherwise pretty good. It clearly has some meaningful frequency content as well.
There is a cyclic behaviour, but it is hidden by large variability.

Do your speed vs. time data look similarly good?
Here we go for speed and depth both vs. time.
https://i.postimg.cc/D0DnZpTp/untitled5.jpg

The speed is quite unsteady, because the test was carried out with a constant working depth, so the draft control was disabled. However, on the tractor, there is A single effect hydraulic cylinder, so there might be a variability.

Based on the plots you showed with the depth, I have a strong suspicion that is what is causing your scatter. You could look at the correlation function between the depth and the force as well as between the speed and the force and try to prove which one is the dominant factor. You are basically treating this system like it has only one independent variable (speed), but it seems like you are ignoring a second independent variable (depth) that is poorly controlled, and the uncertainty from that variable is propagating into your dependent variable (force) and totally washing out the information you truly want, which is force vs. speed.

the phenomen is clearly a bivariate one, but I admit I am not experienced enough for effectively analyse bivariate data.

Make a 3D plot of force as a function of speed and depth. I'd be willing to bet the data look much more organized.

I have made the plot, but in my opinion it was not clear, so I have made two separate cross-plot
https://i.postimg.cc/J0yC93kN/untitled4.jpg

If the B tool was worn out after the early repetitions, that could certainly cause a lot of variation. Things like that are worth considering. You could put the data in chronological order to check if the spread changed over time. More repetitions on its own would not mathematically cause a greater variance.

Good guess!!! B tool is equal the A tool but it was worn out after a year of usage. The data was acquired at the same instant. Do you have any reference material about the variability caused by the wear?

Gold Member
There is a cyclic behaviour, but it is hidden by large variability.

In your time plots, it doesn't appear to be very hidden. It looks quite clear, and a Fourier analysis would quite easily tell you the important frequencies (though doesn't really help with your actual goal here; it's more of a curiosity).

Here we go for speed and depth both vs. time.
https://i.postimg.cc/D0DnZpTp/untitled5.jpg

Clearly the uncertainty from your depth is the big factor here. That looks like roughly +/- 50% variability on the mean.

the phenomen is clearly a bivariate one, but I admit I am not experienced enough for effectively analyse bivariate data.

I have made the plot, but in my opinion it was not clear, so I have made two separate cross-plot
https://i.postimg.cc/J0yC93kN/untitled4.jpg

I have no idea what those plots show. I was thinking a 3D plot if the data points you started with showing the average force vs average speed and average depth.

In your time plots, it doesn't appear to be very hidden. It looks quite clear, and a Fourier analysis would quite easily tell you the important frequencies (though doesn't really help with your actual goal here; it's more of a curiosity).

I will make a plot for your curiosity. However, the frequency is dependent by the clod sizes and therefore correlated with the speed, higher is the speed and higher will be the main frequency.

Clearly the uncertainty from your depth is the big factor here. That looks like roughly +/- 50% variability on the mean.
you're also right.

I have no idea what those plots show. I was thinking a 3D plot if the data points you started with showing the average force vs average speed and average depth.

This is the 3D plot, but it doesn't look very clear to me. https://i.postimg.cc/hhRX3t50/untitled6.jpg

I have also computed the cross-correlation, here the plot https://i.postimg.cc/DysFcT3B/untitled7.jpg.

From the plot I can see that, there is no phase lag between speed and depth with respect to speed. The difference in the magnitude can be explained by the fact that the depth is higher, in magnitude than the speed, right?

But those time plots only had 21 data points per curve. Is that the data from 6 runs, only 3 samples per run?

I made six rund per speed for the A tool and 9 run per speed for the B tool, for each speed. Indeed, you can see 6 red points closer each other.

Gold Member
This is the 3D plot, but it doesn't look very clear to me. https://i.postimg.cc/hhRX3t50/untitled6.jpg

Well, obviously there isn't much use without being able to rotate it and look at it. You could fit a least-squares surface to the data and see how well that works.

I have also computed the cross-correlation, here the plot https://i.postimg.cc/DysFcT3B/untitled7.jpg.

From the plot I can see that, there is no phase lag between speed and depth with respect to speed. The difference in the magnitude can be explained by the fact that the depth is higher, in magnitude than the speed, right?

From that plot you can see that the force is much more strongly correlated with depth than with speed, at least over the rage of variation in the two that your data experiences. Your scatter is almost certainly due to scatter in the depth.

Well, obviously there isn't much use without being able to rotate it and look at it. You could fit a least-squares surface to the data and see how well that works.

From that plot you can see that the force is much more strongly correlated with depth than with speed, at least over the rage of variation in the two that your data experiences. Your scatter is almost certainly due to scatter in the depth.

but the cross-correlation is the product between the two signals, so since the depth is higher than the speed, the cross-correlations between draft and depth should be higher than that of speed? So depth and speed should be normalized, otherwise the cross-correlation is biased to the variabile with lager magnitudes, rigth?

FactChecker
Gold Member
but the cross-correlation is the product between the two signals, so since the depth is higher than the speed, the cross-correlations between draft and depth should be higher than that of speed? So depth and speed should be normalized, otherwise the cross-correlation is biased to the variabile with lager magnitudes, rigth?
Cross correlations are normalized. So the scale and units of a variable does not matter.

Cross correlations are normalized. So the scale and units of a variable does not matter.

I have used Matlab and the default option is to not normalize the cross correlation.
"By default, xcorr computes raw correlations with no normalization" (https://it.mathworks.com/help/signal/ref/xcorr.html#bual1fd-scaleopt).

By using the option 'coeff' for "scaleopt" (it means, that normalizes the sequence so that the autocorrelations at zero lag equal 1), I have got the following plot: https://i.postimg.cc/qB9WpJm0/xcorr_norm.jpg.

and, for this dataset, the speed is more affected by draft than depth. The difference between the two peaks seems to be small in my opinion. By runninng the calculation on different datasets, sometimes cross correlation with respect to depth is larger than that with speed. But the results are quite variable.

JFY, according to the usual theory, the draft (F) can be computed as the following:

F=(As^2+Bs+C)WD

where:
A, B, C coefficients dependent by the machine type and the soil
W: implement width
D: depth

Last edited: