- #1

- 304

- 29

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- #1

- 304

- 29

- #2

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

See https://en.wikipedia.org/wiki/Stepwise_regression

There are critics of these methods, but that is true of all statistical methods. All statistical methods should be used wisely.

If you are using some non-linear model, I think that you could still remove the correlated part of one of your variables and see if the remainder is statistically reasonable to add to the model after the first one is included.

- #3

- 304

- 29

I am not sure what you mean by linear regression. Isn't that meant only if the dependence is linear? I am using least squares fitting.You do not say if you are using linear regression or some other technique. Forward stepwise linear regression would make a model with the highest correlated variable first. Then it would remove the correlated part from the other variables and see if it is statistically reasonable to introduce the remainder into the model. There are techniques called forward selection, backward elimination, and bidirectional elimination.

See https://en.wikipedia.org/wiki/Stepwise_regression

There are critics of these methods, but that is true of all statistical methods. All statistical methods should be used wisely.

- #4

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

Linear regression uses least-squares fitting and is not as restrictive as you might initially think.I am not sure what you mean by linear regression. Isn't that meant only if the dependence is linear? I am using least squares fitting.

Suppose you are looking for the relationship between ##X## and ##Y##, with ##Y## a function of ##X##.

The regression finds the least-squares linear model, but you can apply it to non-linear relationships. You can try linear regression on a model ##Y = aX+b##, but if the relationship looks more like ##Y = aX^2+b##, you can apply linear regression on that. Just square all the ##X## data.

- #5

- 304

- 29

But my relationship is a lot more complicated than that. For example I have something of the form:Linear regression uses least-squares fitting and is not as restrictive as you might initially think.

Suppose you are looking for the relationship between ##X## and ##Y##, with ##Y## a function of ##X##.

The regression finds the least-squares linear model, but you can apply it to non-linear relationships. You can try linear regression on a model ##Y = aX+b##, but if the relationship looks more like ##Y = aX^2+b##, you can apply linear regression on that. Just square all the ##X## data.

$$Bx(x+1)+D(x(x+1))^2+(p+q)x+q10^{-7}\sqrt{x}$$

I do know the functional form of my equation, I don't understand how can I fit a line to this.

- #6

Mentor

- 34,074

- 11,851

What you are running into is called multicolinearity. Or maybe, since it is just two correlated variables, just colinearity.Can someone advice me on what is the best way to proceed?

The easiest thing to do is to just eliminate one of the colinear variables. You can use the AIC or the BIC to choose which model is better if you don’t have a good theoretical reason for choosing one. Or a more rigorous model-building approach like stepwise regression.

You can keep both parameters as long as you are not trying to make inferences about the parameter values. Keeping both will still give good fits to the data, but the parameter values themselves are fundamentally unstable

- #7

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

I see your point that ##pX_3+qX_4## is problematic, nearly redundant. I wonder what a stepwise linear regression would do with it.

- #8

- 304

- 29

So by eliminating one parameter, do you mean setting it to zero? Based on the physics model upon which this equation is built, I do need both parameters. Basically, if I set q=0, p can take over the initial q value in the p+q term, but the sqrt term will vanish, and hence the model will be wrong. If I set p to zero, q would need to become 2 orders of magnitude bigger to take over the p+q part, but then the sqrt part will be too big. I am not sure how can I get rid of one of the parameters, without using a wrong model.What you are running into is called multicolinearity. Or maybe, since it is just two correlated variables, just colinearity.

The easiest thing to do is to just eliminate one of the colinear variables. You can use the AIC or the BIC to choose which model is better if you don’t have a good theoretical reason for choosing one. Or a more rigorous model-building approach like stepwise regression.

You can keep both parameters as long as you are not trying to make inferences about the parameter values. Keeping both will still give good fits to the data, but the parameter values themselves are fundamentally unstable

- #9

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

Then I think you should try a standard linear regression that will force both terms into the model of post #7 and see what you get. At least it would fit your theory. It would be the least-squares model.So by eliminating one parameter, do you mean setting it to zero? Based on the physics model upon which this equation is built, I do need both parameters. Basically, if I set q=0, p can take over the initial q value in the p+q term, but the sqrt term will vanish, and hence the model will be wrong. If I set p to zero, q would need to become 2 orders of magnitude bigger to take over the p+q part, but then the sqrt part will be too big. I am not sure how can I get rid of one of the parameters, without using a wrong model.

As @Dale says, it would be a very ill-conditioned problem.

- #10

- 304

- 29

I see what you mean by linear regression in this case, thanks! But the way I did the fit was basically like that, i.e. I forced both terms into the fit. And the fit looks great as well as the values of p and q are around the values I would expect from theory. My only concern is with uncertainties on the p and q. I saw in other molecular physics papers people fixing one of the parameters when it was very correlated with another, but I am not sure how to quote the errors in that case. I guess it depends on the field (and hence the readers) but I was wondering how would you quote the values and uncertainties in this situation.Then I think you should try a standard linear regression that will force both terms into the model of post #7 and see what you get. At least it would fit your theory. It would be the least-squares model.

As @Dale says, it would be a very ill-conditioned problem.

- #11

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

I'm sorry that I don't feel qualified to answer that question. Perhaps others with knowledge of the molecular physics papers that you refer to can give you better advice. You might want to provide links to those papers and ask specific questions about them. In that case, there might be a better section of this forum to ask the question.I see what you mean by linear regression in this case, thanks! But the way I did the fit was basically like that, i.e. I forced both terms into the fit. And the fit looks great as well as the values of p and q are around the values I would expect from theory. My only concern is with uncertainties on the p and q. I saw in other molecular physics papers people fixing one of the parameters when it was very correlated with another, but I am not sure how to quote the errors in that case. I guess it depends on the field (and hence the readers) but I was wondering how would you quote the values and uncertainties in this situation.

- #12

- 304

- 29

Oh sorry for the confusion, I meant, assuming you were to publish this in your own field (not molecular spectroscopy), how would you present your results.I'm sorry that I don't feel qualified to answer that question. Perhaps others with knowledge of the molecular physics papers that you refer to can give you better advice. You might want to provide links to those papers and ask specific questions about them. In that case, there might be a better section of this forum to ask the question.

- #13

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

Sorry. This is a very extreme case where the difference between the two terms is seven orders of magnitude lower. I have no experience with that, other than numerical issues on the computer.Oh sorry for the confusion, I meant, assuming you were to publish this in your own field (not molecular spectroscopy), how would you present your results.

- #14

- 304

- 29

That's totally ok, thanks a lot for the insights! Just for reference (and for others reading), the resolution of the experiment is good enough such that the sqrt term does make a difference when performing the fitSorry. This is a very extreme case where the difference between the two terms is seven orders of magnitude lower. I have no experience with that, other than numerical issues on the computer.

- #15

Science Advisor

Homework Helper

Gold Member

- 7,916

- 3,560

Stepwise Regression, and Analysis Of Variance, ANOVA, methods would calculate the Coefficient of Partial Determination to see if the additional term (with appropriately adjusted coefficients) is statistically justified. There is a probability associated with that ratio, but I do not know if that is appropriate for your application.That's totally ok, thanks a lot for the insights! Just for reference (and for others reading), the resolution of the experiment is good enough such that the sqrt term does make a difference when performing the fit

- #16

Mentor

- 34,074

- 11,851

I actually mean a model without that parameter at all. Sometimes a model without a given parameter is equivalent to a model with the parameter set to zero, sometimes set to one, sometimes some other value. It depends on the model.So by eliminating one parameter, do you mean setting it to zero?

OK, that is fine then. But you cannot make inferences about the values of the two parameters. You need to restrict your use of the model to making inferences about predictions. The predictions will still be valid even though the parameter estimates will not.Based on the physics model upon which this equation is built, I do need both parameters.

- #17

Science Advisor

Gold Member

- 886

- 479

If I understand you correctly, I believe your analysis is spot on and there isn't anything you can do about it. The model doesn't have any more information than this. It only has a high degree of certainty on p+q, not p. I believe no trickery will get around that fundamental issue.If I let both p and q to vary, the uncertainty on p is big, but it feels like that doesn't reflect the truth, as that error is mainly influenced by q, as they appear as p+q. If I fixed q=0.001, the errors on q and p would be different by a factor of 10 and I am not sure if that makes sense mathematically, as they do appear as p+q.

If you want to test it, try comparing the covariance matrices you get when you fit to variables (p,q) and (p+q,q). In the former case, you should see large off-diagonal terms, and in the latter case I believe that correlation off-diagonal term will be small. If you want to present your results with no amibiguity, I would present your whole covariance matrix for (p,q). Alternatively, you could just present the error on p+q and q (assuming the correlation was small) and put a note in the supplementary materials of your paper (if it has one). I think both of those would be very honest and upfront presentations of your result.

I hope that was helpful!

Share:

- Replies
- 28

- Views
- 972

- Replies
- 4

- Views
- 161

- Replies
- 1

- Views
- 583

- Replies
- 2

- Views
- 156

- Replies
- 3

- Views
- 291

- Replies
- 4

- Views
- 392

- Replies
- 30

- Views
- 769

- Replies
- 3

- Views
- 183

- Replies
- 1

- Views
- 287

- Replies
- 6

- Views
- 556