Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Regression of linear combination better than just regression

  1. Mar 16, 2012 #1
    I have a problem that is giving me a headache. I have measured two angles that I believe to be related to one another, and they are (this is a data set where I have measured the angle from a datum to two features on a bone. There are 14 bones in the data set):

    Angles to feature 1 (F1):

    Angles to feature 2 (F2):

    When I plot F1 vs F2 and do a linear correlation I get an r^2 = 0.47. Related, but not very strongly.

    The thing is, I'm doing this because I have a bunch partial bone specimens and cannot define the datum, so in general I'm going to have the angle between feature 1 and feature 2, and am hoping to be able to get the position of the datum from this angle. So if I plot (F1-F2) vs F1 I get a much better correlation (r^2 = 0.74) and (F1-F2) vs F2 gets even better (r^2 = 0.9)!

    What I don't understand is, how can a linear combination of the two be better than either one alone? I have added no information and the equations are not linearly independent. What am looking at in the plots with the difference?

    I have attached plots of F1 vs F2, Diff vs F1 and Diff vs F2 with the regressions plotted.

    Thanks for your help.

    Attached Files:

  2. jcsd
  3. Mar 16, 2012 #2
    It doesn't surprise me that you get a better correlation when measuring Corr[X v Y,|X - Y|] then a for Corr(X,Y). What is the basis for angle measure: the long axis the bones? How different are the bones? How do you a gauge how the cutter would hold the bones, etc?

    When you measure Corr(X, |X-Y|) or Corr(Y,|X-Y|) you are biasing the measure toward a higher correlation because the difference in the angles is dependent on the angles themselves and not some external reference. With more measurements you would have a regression to the mean difference.
    Last edited: Mar 16, 2012
  4. Mar 16, 2012 #3
    Hi SW
    Thanks for your reply. To be more specific, these measurements are taken off a set of femurs. The F1 angle is the femoral version (see red line in http://www.kevinneeld.com/wp-content/uploads/2011/06/Femoral-Version-Assessment.png) and F1 is the angle from the plane that is perpendicular to the frontal plane in the last link (called the sagittal plane) to the linea aspera at 50% shaft position (http://en.wikipedia.org/wiki/Linea_aspera).

    The separation angle is the angle from the femoral neck to the linea aspera (minus the 90deg constant between the sagittal and frontal planes). In general, I have only the top half of the femur, so I have the femoral neck and the linea aspera and can measure the angle between them. I would like to be able to determine where the frontal plane is from this measurement.

    One thing I noticed is that if I do a 3D plot of x = version, y = linea aspera and z = (version - linea aspera) they lie on a plane that is oriented at 45°. Surprisingly (to me) this is true of any set of random numbers. For example, in Octave (or Matlab)

    X = rand(50,1);
    Y = rand(50,1);
    Z = X-Y;

    you will see that these all lie on a plane. So it seems that my good correlations are purely from just viewing the data from the "right direction" which is hard to get my head around and makes me think that even though my question was "Can I find the frontal plane from the separation between the linea aspera and the femoral neck" I can't answer this with any more confidence than I can answer the question "Can I find the linea aspera angle given the version".

    What do you think? Is that true that the lowest correlation between the two is the limiting one? Or does this simple subtracting of the two variables increase my knowledge somehow?

    Thanks again.
  5. Mar 16, 2012 #4
    I'm not qualified to comment on the technical aspects of your problem. I just wanted to verify my idea of how you are measuring angles. In general if you had random line segments on two flat piece of paper with the same orientation, you would not expect any significant correlation of the two. In your problem, I would expect some kind of pattern to be obvious to inspection and you want to measure it in terms of the Pearson correlation coefficient r or [itex] r^2[/itex]. You are not talking about a linear combination which applies to a vector space. I assume these lines are not directed line segments (vectors) in a vector space. Independent vectors are perpendicular to each other, which would be a very specific pattern. All you want know is if the angles of line segments from two sources X and Y to some to some standardized base line are correlated and to what degree. You write this as Corr(X,Y) where X and Y are random variables which are not independent in this case. When you write this as Corr(X,|X-Y|) or Corr(Y,|X-Y|) you are describing something different. It may have some use, but it's not the well understood Pearson correlation coefficient r. In other words, you might have trouble publishing it.
    Last edited: Mar 16, 2012
  6. Mar 19, 2012 #5
    Hi SW,
    Thanks for your reply. It put me in the right direction (ie, I'm out of my element here), and I'm going to consult a statistician and see what they think.

  7. Mar 20, 2012 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    You haven't demonstrated that one method or the other is a better predictor, only that the correlation differs. The correlation has to do with the slope of the least squares regression line between two variables, not with how accurately that line predicts one variable from the value of the other.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook