# Regression of linear combination better than just regression

In summary: I would expect more of a correlation between the two line segments than between one segment and the difference of the two.This may help you see the correlation of angles and angle differences more clearly. In summary, when comparing two angles from different sources, using the difference between the angles can result in a higher correlation than comparing the angles directly. This may be due to viewing the data from a different perspective or simply a result of subtracting the two variables. However, it is important to consider the limitations and potential biases of this method in accurately determining the relationship between the two angles.
Hi,
I have a problem that is giving me a headache. I have measured two angles that I believe to be related to one another, and they are (this is a data set where I have measured the angle from a datum to two features on a bone. There are 14 bones in the data set):

Angles to feature 1 (F1):
15.225
14.2318
9.4301
12.2947
14.8846
7.6533
9.0948
11.9725
4.2773
14.1819
8.841
17.1037
20.2373
13.4599

Angles to feature 2 (F2):
3.1227
9.4799
7.9047
13.4962
8.5454
24.2871
11.443
12.6693
21.5271
4.0733
5.0085
4.0101
5.4445
16.424

When I plot F1 vs F2 and do a linear correlation I get an r^2 = 0.47. Related, but not very strongly.

The thing is, I'm doing this because I have a bunch partial bone specimens and cannot define the datum, so in general I'm going to have the angle between feature 1 and feature 2, and am hoping to be able to get the position of the datum from this angle. So if I plot (F1-F2) vs F1 I get a much better correlation (r^2 = 0.74) and (F1-F2) vs F2 gets even better (r^2 = 0.9)!

What I don't understand is, how can a linear combination of the two be better than either one alone? I have added no information and the equations are not linearly independent. What am looking at in the plots with the difference?

I have attached plots of F1 vs F2, Diff vs F1 and Diff vs F2 with the regressions plotted.

#### Attachments

• F1VsF2.jpg
26.1 KB · Views: 566
• DiffVsF1.jpg
28.5 KB · Views: 594
• DiffVsF2.jpg
23.1 KB · Views: 568
Hi,
What I don't understand is, how can a linear combination of the two be better than either one alone? I have added no information and the equations are not linearly independent. What am looking at in the plots with the difference?

I have attached plots of F1 vs F2, Diff vs F1 and Diff vs F2 with the regressions plotted.

It doesn't surprise me that you get a better correlation when measuring Corr[X v Y,|X - Y|] then a for Corr(X,Y). What is the basis for angle measure: the long axis the bones? How different are the bones? How do you a gauge how the cutter would hold the bones, etc?

When you measure Corr(X, |X-Y|) or Corr(Y,|X-Y|) you are biasing the measure toward a higher correlation because the difference in the angles is dependent on the angles themselves and not some external reference. With more measurements you would have a regression to the mean difference.

Last edited:
Hi SW
Thanks for your reply. To be more specific, these measurements are taken off a set of femurs. The F1 angle is the femoral version (see red line in http://www.kevinneeld.com/wp-content/uploads/2011/06/Femoral-Version-Assessment.png) and F1 is the angle from the plane that is perpendicular to the frontal plane in the last link (called the sagittal plane) to the linea aspera at 50% shaft position (http://en.wikipedia.org/wiki/Linea_aspera).

The separation angle is the angle from the femoral neck to the linea aspera (minus the 90deg constant between the sagittal and frontal planes). In general, I have only the top half of the femur, so I have the femoral neck and the linea aspera and can measure the angle between them. I would like to be able to determine where the frontal plane is from this measurement.

One thing I noticed is that if I do a 3D plot of x = version, y = linea aspera and z = (version - linea aspera) they lie on a plane that is oriented at 45°. Surprisingly (to me) this is true of any set of random numbers. For example, in Octave (or Matlab)

X = rand(50,1);
Y = rand(50,1);
Z = X-Y;
plot3(X,Y,Z,"bo");

you will see that these all lie on a plane. So it seems that my good correlations are purely from just viewing the data from the "right direction" which is hard to get my head around and makes me think that even though my question was "Can I find the frontal plane from the separation between the linea aspera and the femoral neck" I can't answer this with any more confidence than I can answer the question "Can I find the linea aspera angle given the version".

What do you think? Is that true that the lowest correlation between the two is the limiting one? Or does this simple subtracting of the two variables increase my knowledge somehow?

Thanks again.

Hi SW
l see that these all lie on a plane. So it seems that my good correlations are purely from just viewing the data from the "right direction" which is hard to get my head around and makes me think that even though my question was "Can I find the frontal plane from the separation between the linea aspera and the femoral neck" I can't answer this with any more confidence than I can answer the question "Can I find the linea aspera angle given the version".

What do you think? Is that true that the lowest correlation between the two is the limiting one? Or does this simple subtracting of the two variables increase my knowledge somehow?

Thanks again.

I'm not qualified to comment on the technical aspects of your problem. I just wanted to verify my idea of how you are measuring angles. In general if you had random line segments on two flat piece of paper with the same orientation, you would not expect any significant correlation of the two. In your problem, I would expect some kind of pattern to be obvious to inspection and you want to measure it in terms of the Pearson correlation coefficient r or $r^2$. You are not talking about a linear combination which applies to a vector space. I assume these lines are not directed line segments (vectors) in a vector space. Independent vectors are perpendicular to each other, which would be a very specific pattern. All you want know is if the angles of line segments from two sources X and Y to some to some standardized base line are correlated and to what degree. You write this as Corr(X,Y) where X and Y are random variables which are not independent in this case. When you write this as Corr(X,|X-Y|) or Corr(Y,|X-Y|) you are describing something different. It may have some use, but it's not the well understood Pearson correlation coefficient r. In other words, you might have trouble publishing it.

Last edited:
Hi SW,
Thanks for your reply. It put me in the right direction (ie, I'm out of my element here), and I'm going to consult a statistician and see what they think.

Cheers,
Seth

What I don't understand is, how can a linear combination of the two be better than either one alone?

You haven't demonstrated that one method or the other is a better predictor, only that the correlation differs. The correlation has to do with the slope of the least squares regression line between two variables, not with how accurately that line predicts one variable from the value of the other.

## 1. What is the concept of regression of linear combination and how is it different from just regression?

Regression of linear combination refers to a statistical technique where a linear combination of predictor variables is used to predict an outcome variable. This is different from just regression, where a single predictor variable is used to predict the outcome. In regression of linear combination, the combination of predictor variables can improve the accuracy of the prediction compared to using just one predictor variable.

## 2. How does regression of linear combination improve upon traditional regression techniques?

Regression of linear combination allows for the inclusion of multiple predictor variables, which can capture more complex relationships between the predictors and the outcome. This can lead to a more accurate prediction compared to just using one predictor variable in traditional regression techniques.

## 3. What are some common applications of regression of linear combination?

Regression of linear combination is commonly used in fields such as finance, economics, and social sciences to predict outcomes based on multiple predictor variables. It is also frequently used in machine learning and data science for predictive modeling.

## 4. What are the assumptions of regression of linear combination?

The assumptions of regression of linear combination are similar to those of traditional regression, including linearity, normality, and homoscedasticity. Additionally, the predictors should not be too highly correlated with each other, as this can lead to multicollinearity issues.

## 5. How can the performance of regression of linear combination be evaluated?

The performance of regression of linear combination can be evaluated using metrics such as mean squared error, R-squared, and adjusted R-squared. These metrics can help determine the accuracy of the model and whether the inclusion of multiple predictors has improved the prediction compared to just using one predictor variable.

• Set Theory, Logic, Probability, Statistics
Replies
13
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
23
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
30
Views
3K
• Introductory Physics Homework Help
Replies
20
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
64
Views
3K
• Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
• Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
• Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K