Statistical test for comparing two error signals

Problem: I have a sensor monitoring a process which is controlled by a feedback controller. This sensor fails from time-to-time and I need to replace it with a new one. I have always used the same type of sensor, say type A. Some sensor manufacturers are offering me an alternative sensor technology, say sensor type B to measure the same process with the same theoretical signal characteristics. This needs to be tested though. I cannot afford to replace sensor type A with sensor type B and see how the controller performs. What I can do is to install sensor type B to monitor the same process, but remain "offline" (not used by the controller). This allows me to monitor both signals in parallel. By comparing the two signals, how can I determine if sensor type B will not produce negative impacts in my controller?

Current plan:
I am planning to trial sensor type B by installing three sensors monitoring the same process:
Sensor 1) Sensor type A, this is the reference sensor
Sensor 2) Sensor type A, this is candidate #1
Sensor 3) Sensor type B, this is candidate #2
Generate two error time series: one with the error between candidate #1 and reference and the second the error between candidate #2 and reference. The conclusion of a successful trial should be able to state that the differences of both errors are statistically insignificant.

Statistical test:
My first thought was to use a Student's t-test to compare the error signals. But I understand the t-test only tests for differences in mean values. But I suspect, I also need to know if the error variances are the same.

Questions:
- Will the F-test provide a test that is sensitive to both mean and variance differences?
- Would anyone suggest an alternative approach?
- I am collecting data for a 24 hour period. During this time, the process operates under 5 different regimes. Should I break-up the time series into 5 segments and run separate tests?
Will this split approach change the test criteria?

Other info:
- The error signals have a good approximation to a normal distribution
- The measurement noise is not time-correlated

Stephen Tashi
Using statistics to do hypothesis testing is a subjective procedure. A person with experience in solving problems exactly like yours might be able to say, from experience, what methods work well. I can't. I can only offer a few comments.

By comparing the two signals, how can I determine if sensor type B will not produce negative impacts in my controller?

One consideration is whether your controller is an "integrating" controller that uses the average output of the sensor over a window of time to do its computation (or , if it's an analog controller, it might do the electronic equivalent of integration). If that is the case then the time average of sensor B vs sensor A will be the crucial signal.

Statistical test:
My first thought was to use a Student's t-test to compare the error signals. But I understand the t-test only tests for differences in mean values. But I suspect, I also need to know if the error variances are the same.

You can test for the equality of variances first and if the data passes, you can do the t-test next. (The "significance level" of the two step procedure is not the same as the "significance level" assigned to each step.)

Questios:
- Will the F-test provide a test that is sensitive to both mean and variance differences?

The word "sensitive" is, as far as I know, a subjective term. The quantification of the sensitivity of a test in frequentist statistics is done by computing "power" functions for the statistic. For example, suppose you decide the significance level is 0.05. We can imagine a simulation that takes paired samples (a,b) from two distributions. One distribution has mean 0 and standard deviation 1. The other distribution has mean x and standard deviation 1+y where x and y are held constant while the sampling is done. From the samples, we compute the probability that an F test would judge the two distributions different at the significance level of 0.05. By repeating this simulation for various values of (x,y) we create a power function for the test. Roughly speaking, this shows how well the test does in detecting deviations of various sizes from the null hypothesis. How you use such a curve to make a decision about what to do is subjective.

An interesting twist to your problem is that if sensor B has a smaller standard deviation than sensor A, this might indicate that its a better sensor. However, if you controller has an algorithm that attempts to compensate for sensor noise, it might be better tuned to working with sensor A than sensor B.

I Googled briefly for power curves for the F-test, but didn't find any simple results on that topic. Perhaps you can.

- Would anyone suggest an alternative approach?

I advocate using computer simulations, when possible. However, I don't know if the number of variables in your problem and the algorithm used by the controller are known and simple enough to simulate.

- I am collecting data for a 24 hour period. During this time, the process operates under 5 different regimes. Should I break-up the time series into 5 segments and run separate tests?

I'd say yes, split it up. I heard one lecturer in statistics say "Stratification never hurts. Always stratify when you can." (In statistics, "stratification" refers to analyzing different regimes separately and, if desired, combing the the results to represent a mixed population.)

Will this split approach change the test criteria?

The test critera are going to be up to you. There won't be a mathematical answer to this unless you can define precisely what you are trying to quantify, maximize or minimize etc.

D H
Staff Emeritus