Statistics: linear equation comparision question

In summary: It has an extra term, the error or residual, which is not represented in the data. This is because the equation is linear in the sense that the slope and the intercept are linear functions of the x variable, but the error term is not linear. The second thing to say is that you can't use the ordinary type equation to calculate the R^2 because the R^2 is the correlation coefficient squared. The correlation coefficient is a measure of the degree of linearity between two sets of data. It is a statistic, but it is not a measure of how well the two sets of data match. What you need is an equation that
  • #1
n00bcake22
21
0
Hello Everyone,

I was wondering if there is a statistical method to compare two linear equations. I have one "nominal" equation y1=a1*x+b1 and one equation found using linear regression on a data set, y2=a2*x+b2.

I know I can just "look" and compare the slope and intercept values but it would be nice to have a numerical gauge.

I was thinking, I can determine the R^2 value between the data points and y2 but is there anyway I can find the R^2 value between the data points and the "nominal" equation?

Please excuse my *ahem, non-existent* stats...

Thanks in advance!
 
Physics news on Phys.org
  • #2
It's completely unclear to me why there would be any problem determining the squares of the errors between the nominal equation and the data points. Perhaps if you explain what the data is, I can understand the difficulty.
 
  • #3
Hello Stephen,

Thanks for the reply and this is the situation: I have a torque sensor that comes with factory calibration values for the slope and intercept, "nominal" y1. In the past, however, I have run experiments to gather torque vs voltage data and (through linear regression) obtained my own slope and intercept values, a2 and b2 => y2.

While these values are usually quite close, there have been issues lately. So, what I would like to do is to run the experiment, get the data, and then see how well the default factory values "correlate" to the data. And, if it is below a certain level, then I can use regression on the data to determine new, custom calibration values.

I hope this makes sense... if not just let me know and I will do my best to clarify.

Thanks again!
 
  • #4
You asked how you could find the "R^2" value between the data points and the nominal line. I was interpreting R^2 to be the sum of the squares of the errors in prediction.

(To compute that, you have data points [tex] (x_i, y_i) [/tex] and the prediction equation [tex] y = a_1 x + b_1 [/tex]. So to compute the [tex]i[/tex]th error you compute [tex] E_i = a_1 x_i + b_1 - y_i [/tex]. You square these and add them up.)

Did you have a different interpretation of "R^2" in mind?

I don't know if "correlating" the factory equation to the data is a meaningful idea since we usually correlate two random variables and the prediction of the factory equation is deterministic. If we consider the factory line to be a regression line, then it implies some correlation between the x and y data. You can compute the correlation of the data you observe and compare it to the correlation implied by the factory line.
 
  • #5
Ha ha, well my statistics is terribly rusty (and pretty poor to begin with) so I'll try to just explain my intent in general terms and you can offer your thoughts from there.

All I am really looking for is a way to see "how well" the factory calibration equation fits/matches/predicts/correlates to the experiment's data points and then obtain some type of numerical "gauge." Then, if this gauge value is below a certain level, I will use regression on the experiment data to obtain a custom calibration equation. I know that the regression equation will always be better than the factory equation (worst case it will be the same) but I'm sure their calibration process is far more rigid than mine so I am trying to avoid using my own unless a significant discrepancy exits between the two.

From your last post it sounds like I can, in fact, compute the R^2 value between the experiment data and the factory equation by using the factory equation in place of the regression line. Is this correct? What would you recommend?

UPDATE: by R^2 I mean the correlation coefficient squared. I believe that is what Excel gives you when you add a trendline? Is it possible to compute this using the data points and factory equation?
 
Last edited:
  • #6
To make a good recommendation, the real world problem has to be understood. Let's see if I've got it correctly:

X would be a voltage produced by the sensor and the Y would be the implied torque. Your ordinary use of this sensor would be to measure voltage and compute the torque, but you have the capability of setting up a test situation where you have some other way to measure torque.

You think the errors in your test set up are larger than those used to develop the factory equation, but you also consider the possibility that the factory equation might be wrong for your particular sensor. You want statistics to aid you in making a decision between continuing to use the factory equation or using an equation developed from your own data.

The first thing to say is that the ordinary type of statistics has various customary procedures, but they are just that ! - they are customary but there is no mathematical proof that they are best in any sense. (You'll able to find all sorts of people who make confident recommendations about how to apply these procedures to your problem, but that still won't make those procedures a mathematical solution.) To get an answer that is "best" requires a definition of what "best" means. The usual way to define it is to specify some numerical measure of the penalty for being wrong. In real life problems there is often no overt penalty, but we can begin by getting a general idea.

First, what would a "big" error woud be in a torque measurement. For your applications, can we state what a "big" or unacceptable error would be in ft-lbs or Newton-meters? Or can we state it as a percentage error?

Second, what are the downsides to making an inaccurate measurement.?

I can visualize cases like this:

1) Perhaps this sensor is something you use to pursue a hobby and if it reads off by 50%, you misadjust your go-kart engine or something like that. You would feel some degree of personal dissatisfaction, but there wouldn't be a significant financial cost.

2) Perhaps you use this sensor in procedures to adjust a car used in professional racing. If the sensor is off by 50% your team could lose a prize worth a million dollars.
 
  • #7
Yes. That is the test setup and my situation exactly.

The test is quite accurate but likely far inferior to the manufacturer's. Because of this, I used linear regression to calibrate the sensor to my "noisier" setup. Once the test data began falling outside the acceptable limits, however, my custom calibration made it difficult to isolate the error's source (e.g. shaft runout, loose fastener/coupling, electrical noise, bad sensor, or any other system shift that invalidated my custom calibration). At this point I started thinking it may be preferable to keep the factory values as long as it "acceptably" modeled the test data. The "acceptably" is the statistical gauge I'm looking for, whatever it may be. That way when I run the test it will be a sort of "double validation"- the torque sensor measurements are within the acceptable range (of the supplied torques) and the test is supplying the expected torques (based off of the sensor readings). If it is "not acceptable," then either the sensor/factory-calibration is faulty or the test is not supplying the correct input torques.

To answer your questions, an unacceptable error would be anything beyond +/-10% of the actual torque which ranges from ~1 to 10 in-lbs (with more leeway on the lower end of the range).

Secondly, I would say the "case" falls somewhere in between the two you outlined. While not system or process critical, the sensor (after verification, i.e. my test) will be installed on a device that performs other testing and calibration where accurate torque measurements are important.

Thanks again Stephen for your help and time.
 
  • #8
We need to know (or assume) something quantitative about the errors that your test setup makes. I don't know the best way to get such information, but some ideas would be:

1) If the setup makes a continuous measurement and the voltage reading drifts, record hwo much it drifts in some way.

2) if you can repeat the same test (same torque, same sensor) , record the values you get for several tests.

That sort of data would indicate the variability of your equipment about it's average reading for a given sensor and torque. It wouldn't confirm that the average was correct. For example there could be something that biases the test always to give a reading that is too high. You might be willing to assume that, on average, your test readings are correct. Or you might come up with some special situation (perhaps using a different sensor) where you know right answer and see what your test reads.

----

Some further questions. My guess about your test set up is that most of the uncertainty is in the torque measurement. I assume the voltage reading is precise. Is it taken in similar manner that it would be read when the sensor was installed and in actual use?

Are we dealing with just one sensor? Or must the decision be made for each sensor in a whole batch of sensors? What do the specs of the sensor say about it's accuracy. Something like Plus or minus 1% ?
 
  • #9
Hey Stephen, sorry it took me a while to respond. Things have been hectic.

Anyway, the experiment I run is your described case "2)" and the input torque is VERY repeatable (STDEV < 0.005 inch-pounds, generally). This was measured using a "good" torque sensor.

In response to your questions:

Yes, the uncertainty is in the torque measurement. I believe it is from the sensor's output-voltage variance/offset and not the variance from "reading" the voltage. This is where I was wondering if my "custom" calibration could have negatively effected my readings (i.e. my custom values are significantly different than the default factory values).

Yes, the current test "reads" the sensor the same way it will be used in "actual use."

Finally, there is just one torque sensor being used.

I have attached the sensor's documentation (or at least the one I could find online) so hopefully it provides some assistance(see model 1701).

Thanks.
 

Attachments

  • Model_1700_Datasheet.pdf
    303.8 KB · Views: 233
  • #10
I didn't see any regression equation in the data sheet. Is that info in the data sheet for the torque readout display, the "7559"? Do you set the offset voltage in the hardware for the torque display?
 
  • #11
Oh sorry, the system diagram in the data sheet is just an example the supplier uses. I use a computer to "read" in the voltage and perform whatever calculations are required.

I believe each torque sensor shipped has its own calibration equation (which I am guessing they found using regression on their more rigid experimentation data) so it wouldn't be on the data sheet.

I then use software to incorporate the slope and intercept (voltage offset) values of the calibration (either my own or the factory's) to calculate the torque.
 
  • #12
Which is the independent variable in the factory equation, torque or voltage? (I want to make sure that your data uses the same X as the factory. The regression line for Y as a function of X is different than the regression line for X as a function of Y, even if both are plotted with X on the horizontal axis - as we discuss in https://www.physicsforums.com/showthread.php?t=483688.)

I'm trying to visualize exactly what is measured in your tests. You apply a torque and have some way of knowing what it is. Do you read the voltage from the torque sensor? Or do you only read the display from a computer which is itself reading the voltage and converting it to a torque reading according to a formula that you entered? (The latter case is what the data sheet shows.)
 
  • #13
Very interesting point!

I believe the torque would be the independent variable in the factory's calibration testing (they want to know the output voltage of their sensor provided a known torque). Maybe they attach an arm of known length and center of gravity to provide the "known" torque? I am not absolutely sure of this, however. On the other hand, I suppose they could apply torque until they read a certain voltage and record the torque measured by some other "true" sensor. I am not sure which method they use...

When we receive the sensor, its calibration equation has the voltage as the independent variable.

Here are some more specifics on my calibration test:

For my initial calibration test (i.e. a new sensor is installed on the stand), I use a servo motor to provide ~constant torque (which is measured using another torque transducer) and record the new sensor's average output voltage. My regression is then performed on the data where the independent variable is the average sensor voltage and the dependent variable is the observed torque reading from the other transducer. Now that I think about it, completely trusting that other transducer is beginning to worry me.

After this "calibration," my standard test records a maximum breakaway torque which is VERY repeatable (according to the "calibrated" sensor) but the exact torque is not absolutely known (other than what the "calibrated" sensor now reports).

Hope this all makes sense.
 
  • #14
Some thoughts:

One simple thing to try is to do your regression with the voltage as the independent variable and see if the equation better matches the factory equation.

A visual procedure for judging how well your equation agrees with the factory equation would be to plot two lines on either side of your equation that show an estimate of the typical spread of the data. This isn't the simple statistical test that you want, but it would be a good start in that direction.

Let's say your equation is T =A*V + B where T is torque and V is voltage.
Thinking in terms of a spread sheet, for the column of (voltage, torque) measurements make another column that shows the error in prediction. So if a data point is (V1,T1) the corresponding error would be A*V1 + B - T1. Compute the mean and standard deviation of a column of numbers. I think spreadsheets can do that for you.

Let's say the standard deviation you get is S.

A rough indication of the "spread" that you should see in data if you line is correct is between the lines T = AV + B + 2S and T = AV + V - 2*S. ( You could use 3*S instead of 2*S if you want to allow for an extreme spread.)

If the factory line runs outside the area between the two lines, then this is visual indication that there may be significant disagreement.

The way that I think of "breakaway torque" is loosening a bolt. I only use torque wrenches when I tighten bolts so I have no impression about how repeatable the torque to losen a bolt would be.

I'm curious if the torque sensor has a sampling rate. The way I visualize a breakaway torque measurement, the torque increases until the bolt (or whatever) suddenly turns. So if there is a sampling rate, it would have to be fast enough so it took a reading very near the peak.

Can you (or have you) talked to the technical support people for the sensor?
 
  • #15
If you used two torque sensors at the same time, wouldn't the torque be distributed between the two sensors?

It seems like this setup is sort of like standing with one leg on one scale and another leg on another scale, and expecting them to always be equal because your weight doesn't change.

I could just not be grasping this properly though.
 
  • #16
@ Stephen: Yes, I did something similar to your error prediction idea. I added boundary lines of +/- 10% to the factory equation and looked for trend/outliers in my recorded data points. Again, not really the statistical "gauge" I'm looking for but it roughly indicates if there may be an issue.

Yes, your thought of breakaway torque is correct. The torque builds until "breakaway" and then steeply drops down. The sampling rate I use is 1ms and I log all output voltage data before, and some points after, the maximum breakaway torque.

Yes, I have talked to the technical support staff. More specifically, my issue with the sensor was that there were fluctuations in output voltages as a function of shaft position. That is, if the shaft is set at "zero" degrees and a torque measurement is taken it reads 2.350. But if you rotate the shaft X degrees and re-run the test, a different reading is recorded, 2.3xx. I have stepped through various shaft positions and recorded the maximum breakaway torques and it traces out a sinusoid (Torque= A*sin(B*Degrees + C)); this should, ideally, be constant. I have all but eliminated my system/components and strongly believe the error lies in the sensor which has been sent back. They were unable to reproduce my results but are not willing to explain their test procedure. Also, the torque values I just listed were made up. I can get you the actual data if you really want to see it.

This is all besides the point, my original question was merely a by-product of this whole process. I thought that if I could use the sensor's factory calibration values, run a test, and compute the "statistical gauge" using the sensor's calibration values and test data, then I could at least "isolate" the likely culprit (my system or the transducer) if the statistical gauge result was unacceptable. For example, if I switched out the sensor and the statistical gauge vastly improved, it is likely that the original sensor is faulty. Conversely, if there is little improvement then it is likely that there is an issue in my setup. My thought is that once I perform my "custom" calibration, the system and sensor relationship to the test data is no longer "independent." That is, my regression calibration "lumps" the sensor and system errors together. If that makes sense, or is even correct?...

@ Perfection: That would be the case if I could somehow put the sensors in parallel but they are in series so they should read the same.

To use your example:

Standing on two scales (one under each foot) is parallel so the weight is split between them.

Standing on two scales (one under another, i.e. you standing on a scale, on top of the other scale) is in series and should be the same.
 
  • #17
n00bcake22 said:
More specifically, my issue with the sensor was that there were fluctuations in output voltages as a function of shaft position.

I have stepped through various shaft positions and recorded the maximum breakaway torques and it traces out a sinusoid (Torque= A*sin(B*Degrees + C)); this should, ideally, be constant.

I wonder if the sensor was only designed for dynamic torque readings. If would be interesting to look at data where you recorded the shaft position along with the torque and voltage. Maybe if you measure at a particular angle, you would agree with the factory's regression. Is the variation of voltage with angle big enough to make that plausible?

I can get you the actual data if you really want to see it.

I like looking at actual data, but are things in the problem in the process of being updated? I'd don't want to look at obsolete results. I'm a very busy man. Retired, you know. Retirement takes up all my time. (It really does. The retired guys saying about "How did I ever have time to work?" rings true.)

I thought that if I could use the sensor's factory calibration values, run a test, and compute the "statistical gauge" using the sensor's calibration values and test data, then I could at least "isolate" the likely culprit (my system or the transducer) if the statistical gauge result was unacceptable. For example, if I switched out the sensor and the statistical gauge vastly improved, it is likely that the original sensor is faulty. Conversely, if there is little improvement then it is likely that there is an issue in my setup.
I think your idea is correct. Many people would just "eyball" the results. Is it laborious to switch out various components in your system?

My thought is that once I perform my "custom" calibration, the system and sensor relationship to the test data is no longer "independent." That is, my regression calibration "lumps" the sensor and system errors together. If that makes sense, or is even correct?...

As I interpret that, "custom calibration" means that when you change some component in your test set up, you make adjustments based known input torques and voltages and then you take the "final" measurements. Yes, that process would obscure the cause-and-effect relationship that changing a component has upon the "final" measurements.
 
  • #18
Stephen Tashi said:
I wonder if the sensor was only designed for dynamic torque readings.

It is a through-shaft sensor so I would guess it is intended for both static and dynamic readings.

If would be interesting to look at data where you recorded the shaft position along with the torque and voltage.

Looking back, my sinusoid equation should be Voltage_Sensor = A*sin(...). I have not measured BOTH the sensor voltage and torque (based on another sensor) as a function of shaft position, just the sensor voltage (which fluctuates). I currently do not have the capability to read in another torque-sensor's voltage to compare them side-by-side; sounds like a great functional addition, provided the other torque sensor is "true."

Maybe if you measure at a particular angle, you would agree with the factory's regression. Is the variation of voltage with angle big enough to make that plausible?

Yes. This is entirely plausible.

So far, all of the sensors I have tested exhibit a sinusoidal voltage output as a function of shaft position. Most of the time, however, the amplitude is sufficiently small so that the test repeatability and reproducibility doesn't seem to be an issue. The last sensor's amplitude was MUCH worse (~3.7x) than the others. The test repeatability and reproducibility became an issue when using this sensor. The peak-to-peak torque range (linearly calculated from the output voltage) of this sensor was 0.082 inch-pounds when using the same breakaway torque value; the shaft position was the only variable. As mentioned many times before, the breakaway torque values are very repeatable ( STDEV = ~0.003 inch-pounds using a "good" sensor). While this P-P range is quite small, it becomes an issue when working with the lower-end breakaway torque values of about 0.8 inch-pounds.

I think your idea is correct. Many people would just "eyball" the results. Is it laborious to switch out various components in your system?

Not terribly but it is still a hassle. I wouldn't mind switching out the torque sensor if the "statistical gauge" comparing the factory values and test data was unacceptable. Then I could replace it and compare the two gauge values to one another. I just like to have "probable cause" for swapping components instead of just randomly switching them out to see if it happened to "fixed" the problem (I've had to do this in the past, pretty inefficient).

As I interpret that, "custom calibration" means that when you change some component in your test set up, you make adjustments based known input torques and voltages and then you take the "final" measurements. Yes, that process would obscure the cause-and-effect relationship that changing a component has upon the "final" measurements.

My "custom" calibration is used to replace the factory calibration values shipped with the sensor using linear regression. I install the new sensor and use a servo motor to provide ~constant torque (which is measured using another torque transducer) and record the new sensor's average output voltage and the torque value reported by the other transducer. Regression is then performed on the data where the independent variable is the average sensor voltage and the dependent variable is the observed torque reading from the other transducer.

This is where I am afraid I am "merging" my system's and my new sensor's errors together.

Because of this, I think it would be beneficial to keep the factory calibration settings and just "check" to make sure the sensor sufficiently matches the data. For example, I could repeat the test outlined above except that I could use the average output voltage and factory calibration settings to compute a torque value and somehow compare it to the other transducer's readout torque. Comparing these torque values (one computed, one "read") is what I am trying to achieve but, like you mentioned, I would prefer not to "eyeball" it. That is why I am looking to statistics to find some method to say "the new sensor's factory calibration line matches the 'read' transducer's torque values 'this well'" where "this well" is the numeric gauge.

I'm a very busy man. Retired, you know. Retirement takes up all my time. (It really does. The retired guys saying about "How did I ever have time to work?" rings true.)

Ha ha, and here I am talking your head off. Thanks for all your help, I truly appreciate it. Damn you statistics!
 

FAQ: Statistics: linear equation comparision question

1. What is a linear equation?

A linear equation is an equation that represents a straight line on a graph. It has the general form of y = mx + b, where m is the slope of the line and b is the y-intercept.

2. What is the purpose of comparing linear equations?

The purpose of comparing linear equations is to determine whether they have the same slope and y-intercept, and therefore represent the same line. This can help in identifying patterns and relationships between different sets of data.

3. How do you compare two linear equations?

To compare two linear equations, you can compare their slopes and y-intercepts. If the slopes and y-intercepts are the same, then the equations are equivalent and represent the same line. You can also graph the equations and see if the lines overlap.

4. What is the significance of comparing linear equations?

Comparing linear equations allows us to understand the relationships between different variables and how they change over time. It also helps us identify trends and make predictions based on the data.

5. Can linear equations with different slopes and y-intercepts be compared?

Yes, linear equations with different slopes and y-intercepts can still be compared. However, they will not represent the same line and may have different relationships between the variables. It is important to consider the context and purpose of the comparison when interpreting the results.

Similar threads

Back
Top