- #1

- 385

- 24

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- I
- Thread starter kelly0303
- Start date

- #1

- 385

- 24

- #2

FactChecker

Science Advisor

Gold Member

- 6,531

- 2,608

- #3

Twigg

Science Advisor

Gold Member

- 587

- 291

I have a feeling your issue can be resolved by doing propagation of error on the closed formula for linear regression. This is how both linear and nonlinear regression algorithms calculate uncertainty on the parameters.

- #4

- 385

- 24

I don't have any measured data. I just want to simulate some mock data and fit to it. I can have any number of ##(x,y)## pairs. It depends on the problem I am trying to solve, but definitely more than 2.

- #5

- 385

- 24

In practice ##y_i## is a frequency measurement. So basically I have a counting experiment (with Poisson error on each point) and then I fit these points with a Voigt/Lorentz profile and extract the error from the fit.

I have a feeling your issue can be resolved by doing propagation of error on the closed formula for linear regression. This is how both linear and nonlinear regression algorithms calculate uncertainty on the parameters.

- #6

Twigg

Science Advisor

Gold Member

- 587

- 291

GotchaIn practice ##y_i## is a frequency measurement. So basically I have a counting experiment (with Poisson error on each point) and then I fit these points with a Voigt/Lorentz profile and extract the error from the fit.

Is this the procedure you have in mind?

- choose some known values of the x-axis: ##x_i## for ##1\leq i \leq N##
- generate artificial y-axis (frequency) measurements ##y_i## that are each normally distributed with standard deviation ##\sigma_{y_i}## and with mean ##ax_i + b## where a and b are values that you choose in the simulation
- do linear regression to find estimates for the uncertainty on the parameters ##\hat{a}## and ##\hat{b}## (the hat on ##\hat{a}## means estimate on ##a## as opposed to the value to you coded into the simulation)

Take a skim through this wikipedia page. You'll find expressions in the "Model-based Properties" sections for ##s_{\hat{\alpha}}## and ##s_{\hat{\beta}}## which correspond to the standard deviations on ##\hat{a}## and ##\hat{b}## here.

If you use a nonlinear model in the future, you can use a numerical routine like Matlab's nlinfit which does Levenburg-Marquardt fitting and returns uncertainty based on local linear regression. I don't know a good equivalent to it in scipy off the top of my head :/

- #7

Twigg

Science Advisor

Gold Member

- 587

- 291

- #8

- 385

- 24

Thank you! Yes, that's exactly what I had in mind. For the linear fit, are you saying that doing a fit or doing error propagations are equivalent? I assume that doing a fit is faster given that I can just use scipy (and 3 lines of code), no?Gotcha

Is this the procedure you have in mind?

If so, you don't need to simulate data. The Monte Carlo approach you describe can be replaced with error propagation.

- choose some known values of the x-axis: ##x_i## for ##1\leq i \leq N##
- generate artificial y-axis (frequency) measurements ##y_i## that are each normally distributed with standard deviation ##\sigma_{y_i}## and with mean ##ax_i + b## where a and b are values that you choose in the simulation
- do linear regression to find estimates for the uncertainty on the parameters ##\hat{a}## and ##\hat{b}## (the hat on ##\hat{a}## means estimate on ##a## as opposed to the value to you coded into the simulation)

Take a skim through this wikipedia page. You'll find expressions in the "Model-based Properties" sections for ##s_{\hat{\alpha}}## and ##s_{\hat{\beta}}## which correspond to the standard deviations on ##\hat{a}## and ##\hat{b}## here.

If you use a nonlinear model in the future, you can use a numerical routine like Matlab's nlinfit which does Levenburg-Marquardt fitting and returns uncertainty based on local linear regression. I don't know a good equivalent to it in scipy off the top of my head :/

Why can't I use scipy curve fit for a non-linear function? Their program returns the correlation matrix for the parameters. Isn't that what I am looking for?

- #9

Twigg

Science Advisor

Gold Member

- 587

- 291

If you did the Monte Carlo approach, you'd need to do a handful of runs of the simulation so you get a nice sampling over many different possibilities for the ##y_i##'s. Instead, you can solve the problem analytically (to first order, at least) by propagating the errors ##\sigma_{y_i}## through the explicit formulas for linear regression. It's just an efficiency thing really. In fact, the Monte Carlo approach doesn't make the first-order approximation that error propagation does. It's really up to you, just thought I'd make the suggestionFor the linear fit, are you saying that doing a fit or doing error propagations are equivalent?

Scipy curve fit looks perfect. I'm just a beginner at scipyWhy can't I use scipy curve fit for a non-linear function? Their program returns the correlation matrix for the parameters. Isn't that what I am looking for?

- #10

- 385

- 24

Thank you! So for the case of non-linear fit (or where I can't solve the problem analytically). How should I proceed exactly? For example for a given assumed error on the measurement, say ##1kHz## (assume same error on all data points), I would do the steps above and obtain some estimates and errors for the parameters. Then I would repeat this procedure (same error on the measurements) for a big number of time, say 1000, and get 1000 estimates and errors on the parameters. Then I would use basics statistics to calculate the mean and error from all these 1000 values. Then I would change the error on the measurement (say 100 Hz) and redo the 1000 samples again and so on. Is this what you meant?If you did the Monte Carlo approach, you'd need to do a handful of runs of the simulation so you get a nice sampling over many different possibilities for the ##y_i##'s. Instead, you can solve the problem analytically (to first order, at least) by propagating the errors ##\sigma_{y_i}## through the explicit formulas for linear regression. It's just an efficiency thing really. In fact, the Monte Carlo approach doesn't make the first-order approximation that error propagation does. It's really up to you, just thought I'd make the suggestion

Scipy curve fit looks perfect. I'm just a beginner at scipy

- #11

FactChecker

Science Advisor

Gold Member

- 6,531

- 2,608

- #12

Twigg

Science Advisor

Gold Member

- 587

- 291

@kelly0303 That approach sounds good. I believe you could get a great estimate by just doing one iteration of the steps above, where the y-values are exactly equal to the model function ##y_i = f(x_i | a,b,c,...)## but with inverse-variance weighting scheme given by the known error (1000kHz in your first example) squared (1e12 Hz^2). I can't remember, but I think this will either give you a fit with a smallish covariance matrix (small by a factor of <3 I'd bet) or it'll make the Levenburg-Marquardt iteration diverge. I forget which but worth a try!

- #13

- 385

- 24

But if I use the exact values for ##y_i##, without adding some extra noise, won't the error on the parameters be smaller, even if I add the inverse weighting (in my idea I would add both the noise and the inverse weighting)? Is that factor of 3 because of that?

@kelly0303 That approach sounds good. I believe you could get a great estimate by just doing one iteration of the steps above, where the y-values are exactly equal to the model function ##y_i = f(x_i | a,b,c,...)## but with inverse-variance weighting scheme given by the known error (1000kHz in your first example) squared (1e12 Hz^2). I can't remember, but I think this will either give you a fit with a smallish covariance matrix (small by a factor of <3 I'd bet) or it'll make the Levenburg-Marquardt iteration diverge. I forget which but worth a try!

Also, what do you mean by just one iteration? You mean not do that 1000 times for each error value on ##y_i##? Would that work? For some reason I thought that in Monte Carlo approach (not sure if this counts as MC tho), I need lots of samples and then average over them. Would just one sample for each error give me the same result as many samples and averaging?

- #14

Twigg

Science Advisor

Gold Member

- 587

- 291

One sample wouldn't give you the full story, as you say.Would just one sample for each error give me the same result as many samples and averaging?

Yes, it will be smaller. That's why I threw out that factor of less than 3ish (just a total guess on my part).But if I use the exact values for yi, without adding some extra noise, won't the error on the parameters be smaller, even if I add the inverse weighting (in my idea I would add both the noise and the inverse weighting)? Is that factor of 3 because of that?

The idea was if you were in a hurry to get a number, just a rough guess, you could do a single iteration like I was saying. It should be same order as the result you would get with your procedure, just a little small. You have the right idea

I probably overuse the term! Something long those lines though.not sure if this counts as MC

Share:

- Replies
- 2

- Views
- 18K