- #1
chipotleaway
- 174
- 0
Background:
My lab group has taken a number of measurements of gas levels over 15 minute period every 15 seconds. We actually used a gas sensor and computer interface to do it so it pretty much did all the work. In calculating the errors for the data, using the uncertainties given in the user manuals of device, we got unreasonably large errors as the uncertainties on the devices are rather large (20%). So it was suggested that we use linear regression (?) to calculate the trendline for the data and the corresponding errors.
The formulas I'm using to find the trendline and errors are from *, and it involves equations (3) and (7) from the paper.
*http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf
These give the respective intercept a and slope b. Then I used equation (15) to find the correlation coefficient:
[tex]\r=b\frac{S_x}{S_y}[/tex]
b is the value found from the big formula in (7), [itex]S_x[/itex] and [itex]S_y[/itex] are the standard deviations in the dependent and independent data sets.
This was needed to calculate the standard error using formula (17), but I ran into some trouble because formula (15) gave a correlation coefficient of larger than |1| for some of the data sets, and when you plug this into (17), you get a negative square root.
So I did a quick search and found an alternative formula from http://www.ditutor.com/regression/correlation_coefficient.html
[tex]r=\frac{cov(x,y)}{S_xS_y}[/tex]
which gave correlation coefficents of less than |1| for all the data sets.
But the thing is, for some of the data sets, both formulas work and the latter gives a much small errors than the former - although both do give very small errors (you can't see them on the graph). For now, I'm using the one which gives the bigger error, but which should I use and why doesn't the correlation coefficient formula in the paper always work?
Also, the trendline calculated using the formula from the paper is also quite poor - there is a very clear linear trend in the data points and the one that I calculated using the formula is a pretty bad fit. Can I just use apply the errors I calculated to my data points and not use the trendline?
I'm not sure I can do that because the paper says that the standard error is in the slope of the regression line, not for the actual data points.
By the way, in case there is a difference between a 'trendline' and a 'regression line' - I mean the same thing in this post as I'm not aware of what the differences are...my apologies to anyone who gets offended by this!Thanks
My lab group has taken a number of measurements of gas levels over 15 minute period every 15 seconds. We actually used a gas sensor and computer interface to do it so it pretty much did all the work. In calculating the errors for the data, using the uncertainties given in the user manuals of device, we got unreasonably large errors as the uncertainties on the devices are rather large (20%). So it was suggested that we use linear regression (?) to calculate the trendline for the data and the corresponding errors.
The formulas I'm using to find the trendline and errors are from *, and it involves equations (3) and (7) from the paper.
*http://seismo.berkeley.edu/~kirchner/eps_120/Toolkits/Toolkit_10.pdf
These give the respective intercept a and slope b. Then I used equation (15) to find the correlation coefficient:
[tex]\r=b\frac{S_x}{S_y}[/tex]
b is the value found from the big formula in (7), [itex]S_x[/itex] and [itex]S_y[/itex] are the standard deviations in the dependent and independent data sets.
This was needed to calculate the standard error using formula (17), but I ran into some trouble because formula (15) gave a correlation coefficient of larger than |1| for some of the data sets, and when you plug this into (17), you get a negative square root.
So I did a quick search and found an alternative formula from http://www.ditutor.com/regression/correlation_coefficient.html
[tex]r=\frac{cov(x,y)}{S_xS_y}[/tex]
which gave correlation coefficents of less than |1| for all the data sets.
But the thing is, for some of the data sets, both formulas work and the latter gives a much small errors than the former - although both do give very small errors (you can't see them on the graph). For now, I'm using the one which gives the bigger error, but which should I use and why doesn't the correlation coefficient formula in the paper always work?
Also, the trendline calculated using the formula from the paper is also quite poor - there is a very clear linear trend in the data points and the one that I calculated using the formula is a pretty bad fit. Can I just use apply the errors I calculated to my data points and not use the trendline?
I'm not sure I can do that because the paper says that the standard error is in the slope of the regression line, not for the actual data points.
By the way, in case there is a difference between a 'trendline' and a 'regression line' - I mean the same thing in this post as I'm not aware of what the differences are...my apologies to anyone who gets offended by this!Thanks