Dealing with scattered data

  • Thread starter serbring
  • Start date
  • #1
269
2
Hi all,

I have few measurements of a tillage machine running two types of soil engaging tools in terms of geometry (named A and B, respectively). I have measured the draft for running the machine and the speed. My aim is to analyse how much is the difference of the two tools in terms of force or power.
Below you can see a sample plot of the test.

https://i.postimg.cc/0QZYBYgj/untitled.jpg

As you can see, the data are rather scattered and I having some difficulties in drawing some conclusions. We can speculate, that A tends to lead higher force than B, but I am not fully sure, because I cannot assume the right regression curve. Probably, there are few outliers, which I cannot identify without assuming the regression curve.
What could I do? Any suggestion is appreciated.

Cheers
 

Answers and Replies

  • #2
anorlunda
Staff Emeritus
Insights Author
9,156
6,154
No matter what you do, you'll never be confident with the answer.

Assuming that the bad dates is because of rocks. A rock causes an impulse shock. Can you do time based filtering boogie plotting force versus speed? Reject impulses. That's just a guess.

Do you have the raw data including time stamps?
 
Last edited:
  • #4
anorlunda
Staff Emeritus
Insights Author
9,156
6,154
What do you mean for "time based filtering boogie"?
:eek:

It means my keyboard auto complete failed me again.

I meant "time based filtering" no boogie. o_O
 
  • #5
anorlunda
Staff Emeritus
Insights Author
9,156
6,154
It looks like all noise and no data, as if something very wrong happened with your experiment.
 
  • Like
Likes Merlin3189
  • #6
Tom.G
Science Advisor
3,711
2,394
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.

For experiment verification, try it in a bed of dry sand or a well plowed field.
scattered data.jpg
scattered data.jpg


Cheers,
Tom
 

Attachments

  • #7
FactChecker
Science Advisor
Gold Member
6,059
2,343
You could do some statistical analysis, but I think that the data you show will not allow you do draw a conclusion at any confidence level about any difference in the average or in the trend. You may be able to draw a conclusion about the variation of the data, since the spread of the B data seems so much greater than the spread of the A data.
 
  • #8
269
2
I am clarifying the data analysis

Here, I have a draft data https://i.postimg.cc/MKXgRjGb/untitled2.jpg. The measurements were carried out on steady conditions and the acquisition started and ended with a still tractor (that is the reason because the draft is almost 0kN at the beginning and the end of the acquisition. The plot reported is a repetition of a testing condition. The red part, is the part that I have selected for the analysis and from where, I have calculated the average value. I have a similar plot for the speed with its red part, from where I have calculated the average value as well.

Then, I have made a crossplot of the average value of draft with data and I get the following plot.

https://i.postimg.cc/CKfkCSfv/untitled.jpg

Now the data is positive, because I have plot the absolute value of the draft. Moreover, the plot is a bit less scattered, because the steady part was not perfectly selected in the previous script.
 
Last edited:
  • #9
269
2
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.

For experiment verification, try it in a bed of dry sand or a well plowed field.
View attachment 231181 View attachment 231181

Cheers,
Tom
The fact the dract is constant/decreasing with the speed is wierd, since from the theory it should quadratically increase.

You're on right that the implement may float with the speed, but I thinkt, that is not the case. Here you can see the working depth with the speed.

https://i.postimg.cc/ZKMmkSCF/untitled3.jpg

As you can see, it is almost constant. The little difference could be caused by a little different setup in the measurements.


You could do some statistical analysis, but I think that the data you show will not allow you do draw a conclusion at any confidence level about any difference in the average or in the trend. You may be able to draw a conclusion about the variation of the data, since the spread of the B data seems so much greater than the spread of the A data.
How could I statistically evaluate the spread of the data in this case? Is there any approach you would use? With categorical data, the problems is my opinion easier to carry out
 
  • #10
269
2
It looks like all noise and no data, as if something very wrong happened with your experiment.
Why? The draft is low at the beginning and the end of the acquisition because the tractor is still and the implement is out of the soil.
 
  • #11
FactChecker
Science Advisor
Gold Member
6,059
2,343
After doing linear regressions (separately for A and B), there remain residual errors (see http://www.statisticshowto.com/residual/ ), which are the differences between the actual data velues and the predictions from the regression equations. Most statistics regression programs will report the residual standard deviation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sigma.html ). The statistical properties of the residual standard error are mixed with the statistical properties of the regression line coefficients, since they are also estimates and not true values. But your case displays so much difference in sets A and B that I think it will be statistically significant.

PS. I am not experienced enough to give advice on how to handle data with sets of different varience magnitudes. Maybe someone more expert can help here. I have reached my knowledge limit.
 
  • #12
boneh3ad
Science Advisor
Insights Author
Gold Member
3,204
888
I'm a little confused. One plot is force vs. time and appears well behaved. Another is force vs. speed and appears to be pure noise. Who are you converting from on to the other? Are you recording your speed separately and pulling out chunks of that F vs. t graph? Do you have multiple F vs. t at different speeds?
 
  • #13
CWatters
Science Advisor
Homework Helper
Gold Member
10,532
2,298
Have you checked the ground is uniform? Eg do several runs at a constant 1.2m/s and see if there is a lot of variability. If there is I think the answer is to make multiple runs at each speed to average down the variability.
 
  • #14
269
2
Have you checked the ground is uniform? Eg do several runs at a constant 1.2m/s and see if there is a lot of variability. If there is I think the answer is to make multiple runs at each speed to average down the variability.
Size

I made 6 runs for each speed. Probably, they were not enough.
 
  • #15
CWatters
Science Advisor
Homework Helper
Gold Member
10,532
2,298
In case it helps, here is a rough graph of the data.
NOTE that the Red is roughly ±15% of nominal and the Black is roughly ±25% of nominal; both trending to decrease with speed. Perhaps they float a bit with increasing speed. (not unusual with farm equipment being towed) Or if this is drawbar force, the inertia of the tiller could account for the force decreasing with speed.
Is it just me....The magnitude of the force appears to increase with speed as expected. Normally a negative sign indicates the direction.
 
  • Like
Likes serbring
  • #16
269
2
I'm a little confused. One plot is force vs. time and appears well behaved. Another is force vs. speed and appears to be pure noise. Who are you converting from on to the other? Are you recording your speed separately and pulling out chunks of that F vs. t graph? Do you have multiple F vs. t at different speeds?
I have the time based force and speed acquired in the same moment. Tests were carried at different target speeds , which were 0.8m/s, 1.2m/s and 1.6m/s. The plot that report, the force vs. speed is the plot of the average value of the time based force vs. the average value of the time based speed. The average values were calculated in the the red part of the time based force, that is the only part where the tool was into the ground and were the tractor reached the target speeds, which were 0.8m/s, 1.2m/s and 1.6m/s
 
  • #17
269
2
After doing linear regressions (separately for A and B), there remain residual errors (see http://www.statisticshowto.com/residual/ ), which are the differences between the actual data velues and the predictions from the regression equations. Most statistics regression programs will report the residual standard deviation (see https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sigma.html ). The statistical properties of the residual standard error are mixed with the statistical properties of the regression line coefficients, since they are also estimates and not true values. But your case displays so much difference in sets A and B that I think it will be statistically significant.

PS. I am not experienced enough to give advice on how to handle data with sets of different varience magnitudes. Maybe someone more expert can help here. I have reached my knowledge limit.
Since the theory reports that the trend of force with respect speed is quadratic, I could use that regression model, I am able to compute the residual and their standard devition, but less able to effectively analyse their results. Moreover, I remembered that I have carried out more repetition with B tool than with A tool. Coule this explain the larger variability for the B tool?
 
  • #18
CWatters
Science Advisor
Homework Helper
Gold Member
10,532
2,298
Is it possible the depth of the tool wasn't consistent? I'm surprised you are getting such large variation even after averaging 6 runs. What's the variation like run to run at same speed?
 
  • #19
boneh3ad
Science Advisor
Insights Author
Gold Member
3,204
888
I have the time based force and speed acquired in the same moment. Tests were carried at different target speeds , which were 0.8m/s, 1.2m/s and 1.6m/s. The plot that report, the force vs. speed is the plot of the average value of the time based force vs. the average value of the time based speed. The average values were calculated in the the red part of the time based force, that is the only part where the tool was into the ground and were the tractor reached the target speeds, which were 0.8m/s, 1.2m/s and 1.6m/s
The force vs. time plots you showed appeared to be pretty good. It has some scatter, but is otherwise pretty good. It clearly has some meaningful frequency content as well.

Do your speed vs. time data look similarly good?

Based on the plots you showed with the depth, I have a strong suspicion that is what is causing your scatter. You could look at the correlation function between the depth and the force as well as between the speed and the force and try to prove which one is the dominant factor. You are basically treating this system like it has only one independent variable (speed), but it seems like you are ignoring a second independent variable (depth) that is poorly controlled, and the uncertainty from that variable is propagating into your dependent variable (force) and totally washing out the information you truly want, which is force vs. speed.

Make a 3D plot of force as a function of speed and depth. I'd be willing to bet the data look much more organized.
 
  • #20
FactChecker
Science Advisor
Gold Member
6,059
2,343
Since the theory reports that the trend of force with respect speed is quadratic, I could use that regression model, I am able to compute the residual and their standard deviation, but less able to effectively analyze their results. Moreover, I remembered that I have carried out more repetition with B tool than with A tool. Could this explain the larger variability for the B tool?
If the B tool was worn out after the early repetitions, that could certainly cause a lot of variation. Things like that are worth considering. You could put the data in chronological order to check if the spread changed over time. More repetitions on its own would not mathematically cause a greater variance.
 
  • #21
anorlunda
Staff Emeritus
Insights Author
9,156
6,154
Size

I made 6 runs for each speed. Probably, they were not enough.
But those time plots only had 21 data points per curve. Is that the data from 6 runs, only 3 samples per run?
 
  • #22
269
2
The force vs. time plots you showed appeared to be pretty good. It has some scatter, but is otherwise pretty good. It clearly has some meaningful frequency content as well.
There is a cyclic behaviour, but it is hidden by large variability.

Do your speed vs. time data look similarly good?
Here we go for speed and depth both vs. time.
https://i.postimg.cc/D0DnZpTp/untitled5.jpg

The speed is quite unsteady, because the test was carried out with a constant working depth, so the draft control was disabled. However, on the tractor, there is A single effect hydraulic cylinder, so there might be a variability.

Based on the plots you showed with the depth, I have a strong suspicion that is what is causing your scatter. You could look at the correlation function between the depth and the force as well as between the speed and the force and try to prove which one is the dominant factor. You are basically treating this system like it has only one independent variable (speed), but it seems like you are ignoring a second independent variable (depth) that is poorly controlled, and the uncertainty from that variable is propagating into your dependent variable (force) and totally washing out the information you truly want, which is force vs. speed.
the phenomen is clearly a bivariate one, but I admit I am not experienced enough for effectively analyse bivariate data.


Make a 3D plot of force as a function of speed and depth. I'd be willing to bet the data look much more organized.
I have made the plot, but in my opinion it was not clear, so I have made two separate cross-plot
https://i.postimg.cc/J0yC93kN/untitled4.jpg
 
  • #23
269
2
If the B tool was worn out after the early repetitions, that could certainly cause a lot of variation. Things like that are worth considering. You could put the data in chronological order to check if the spread changed over time. More repetitions on its own would not mathematically cause a greater variance.
Good guess!!! B tool is equal the A tool but it was worn out after a year of usage. The data was acquired at the same instant. Do you have any reference material about the variability caused by the wear?
 
  • #24
boneh3ad
Science Advisor
Insights Author
Gold Member
3,204
888
There is a cyclic behaviour, but it is hidden by large variability.
In your time plots, it doesn't appear to be very hidden. It looks quite clear, and a Fourier analysis would quite easily tell you the important frequencies (though doesn't really help with your actual goal here; it's more of a curiosity).


Here we go for speed and depth both vs. time.
https://i.postimg.cc/D0DnZpTp/untitled5.jpg
Clearly the uncertainty from your depth is the big factor here. That looks like roughly +/- 50% variability on the mean.

the phenomen is clearly a bivariate one, but I admit I am not experienced enough for effectively analyse bivariate data.

I have made the plot, but in my opinion it was not clear, so I have made two separate cross-plot
https://i.postimg.cc/J0yC93kN/untitled4.jpg
I have no idea what those plots show. I was thinking a 3D plot if the data points you started with showing the average force vs average speed and average depth.
 
  • #25
269
2
In your time plots, it doesn't appear to be very hidden. It looks quite clear, and a Fourier analysis would quite easily tell you the important frequencies (though doesn't really help with your actual goal here; it's more of a curiosity).
I will make a plot for your curiosity. However, the frequency is dependent by the clod sizes and therefore correlated with the speed, higher is the speed and higher will be the main frequency.

Clearly the uncertainty from your depth is the big factor here. That looks like roughly +/- 50% variability on the mean.
you're also right.

I have no idea what those plots show. I was thinking a 3D plot if the data points you started with showing the average force vs average speed and average depth.
This is the 3D plot, but it doesn't look very clear to me. https://i.postimg.cc/hhRX3t50/untitled6.jpg


I have also computed the cross-correlation, here the plot https://i.postimg.cc/DysFcT3B/untitled7.jpg.

From the plot I can see that, there is no phase lag between speed and depth with respect to speed. The difference in the magnitude can be explained by the fact that the depth is higher, in magnitude than the speed, right?
 

Related Threads on Dealing with scattered data

Replies
1
Views
630
Replies
5
Views
2K
Replies
4
Views
28K
Replies
1
Views
1K
Replies
12
Views
3K
  • Last Post
Replies
1
Views
1K
  • Last Post
Replies
21
Views
2K
  • Last Post
Replies
2
Views
2K
Replies
2
Views
2K
Top