Statistical test for comparing two error signals

aydos · Dec 8, 2011

Problem: I have a sensor monitoring a process which is controlled by a feedback controller. This sensor fails from time-to-time and I need to replace it with a new one. I have always used the same type of sensor, say type A. Some sensor manufacturers are offering me an alternative sensor technology, say sensor type B to measure the same process with the same theoretical signal characteristics. This needs to be tested though. I cannot afford to replace sensor type A with sensor type B and see how the controller performs. What I can do is to install sensor type B to monitor the same process, but remain "offline" (not used by the controller). This allows me to monitor both signals in parallel. By comparing the two signals, how can I determine if sensor type B will not produce negative impacts in my controller?

Current plan:
I am planning to trial sensor type B by installing three sensors monitoring the same process:
Sensor 1) Sensor type A, this is the reference sensor
Sensor 2) Sensor type A, this is candidate #1
Sensor 3) Sensor type B, this is candidate #2
Generate two error time series: one with the error between candidate #1 and reference and the second the error between candidate #2 and reference. The conclusion of a successful trial should be able to state that the differences of both errors are statistically insignificant.

Statistical test:
My first thought was to use a Student's t-test to compare the error signals. But I understand the t-test only tests for differences in mean values. But I suspect, I also need to know if the error variances are the same.

Questions:
- Will the F-test provide a test that is sensitive to both mean and variance differences?
- Would anyone suggest an alternative approach?
- I am collecting data for a 24 hour period. During this time, the process operates under 5 different regimes. Should I break-up the time series into 5 segments and run separate tests?
Will this split approach change the test criteria?

Other info:
- The error signals have a good approximation to a normal distribution
- The measurement noise is not time-correlated

Stephen Tashi · Dec 12, 2011

Using statistics to do hypothesis testing is a subjective procedure. A person with experience in solving problems exactly like yours might be able to say, from experience, what methods work well. I can't. I can only offer a few comments.

aydos said:

By comparing the two signals, how can I determine if sensor type B will not produce negative impacts in my controller?

One consideration is whether your controller is an "integrating" controller that uses the average output of the sensor over a window of time to do its computation (or , if it's an analog controller, it might do the electronic equivalent of integration). If that is the case then the time average of sensor B vs sensor A will be the crucial signal.

Statistical test:
My first thought was to use a Student's t-test to compare the error signals. But I understand the t-test only tests for differences in mean values. But I suspect, I also need to know if the error variances are the same.

You can test for the equality of variances first and if the data passes, you can do the t-test next. (The "significance level" of the two step procedure is not the same as the "significance level" assigned to each step.)

Questios:
- Will the F-test provide a test that is sensitive to both mean and variance differences?

The word "sensitive" is, as far as I know, a subjective term. The quantification of the sensitivity of a test in frequentist statistics is done by computing "power" functions for the statistic. For example, suppose you decide the significance level is 0.05. We can imagine a simulation that takes paired samples (a,b) from two distributions. One distribution has mean 0 and standard deviation 1. The other distribution has mean x and standard deviation 1+y where x and y are held constant while the sampling is done. From the samples, we compute the probability that an F test would judge the two distributions different at the significance level of 0.05. By repeating this simulation for various values of (x,y) we create a power function for the test. Roughly speaking, this shows how well the test does in detecting deviations of various sizes from the null hypothesis. How you use such a curve to make a decision about what to do is subjective.

An interesting twist to your problem is that if sensor B has a smaller standard deviation than sensor A, this might indicate that its a better sensor. However, if you controller has an algorithm that attempts to compensate for sensor noise, it might be better tuned to working with sensor A than sensor B.

I Googled briefly for power curves for the F-test, but didn't find any simple results on that topic. Perhaps you can.

- Would anyone suggest an alternative approach?

I advocate using computer simulations, when possible. However, I don't know if the number of variables in your problem and the algorithm used by the controller are known and simple enough to simulate.

- I am collecting data for a 24 hour period. During this time, the process operates under 5 different regimes. Should I break-up the time series into 5 segments and run separate tests?

I'd say yes, split it up. I heard one lecturer in statistics say "Stratification never hurts. Always stratify when you can." (In statistics, "stratification" refers to analyzing different regimes separately and, if desired, combing the the results to represent a mixed population.)

Will this split approach change the test criteria?

The test critera are going to be up to you. There won't be a mathematical answer to this unless you can define precisely what you are trying to quantify, maximize or minimize etc.

D H · Dec 12, 2011

aydos said:

Other info:
- The error signals have a good approximation to a normal distribution
- The measurement noise is not time-correlated

That pretty much goes out the door (and so do your statistical tests) when you are looking for failures. Failures are, by definition, regimes where the sensor does not behave according to spec.

Assuming no failures (and if you are seeing failures in a 24 hour window you have some pretty lousy sensors), all that you will garner from testing is that your sensors behave better than spec. Spec behavior is what manufacturers guarantee. That means that a random sensor that you buy is almost certainly going to outdo spec. Manufacturers don't want to be faced with lawsuits because their spec is one sigma behavior.

Lacking spec values, what your testing can do is elicit non-failure behavior. You can use this (or spec behavior) to test for failures. Are the sensed values at all consistent with expectations? If they're not, you have a suspect sensor. What to do? That depends on the sensor. You need to have some kind of model of how the sensor fails. Does it go utterly schizo, generating random values? Does it freeze? Does it fail off-scale high / off-scale low? Does it just send a bad value every once in a while, but then return to nominal?

Statistical test for comparing two error signals

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad The countability paradox of computable numbers

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect