Annoying stats question, think I'm answering it right?

Ramjam · Mar 17, 2015

< Mentor Note -- thread moved to HH from the technical math forums, so no HH Template is shown >[/color]

Hi everyone,

Got a stats question here from my revision material, but I am not sure if I've answered the whole question or not,
http://[URL=http://s57.photobucket.com/user/w00kie123/media/IMAG1057_1.jpg.html][PLAIN][PLAIN]http://i57.photobucket.com/albums/g209/w00kie123/IMAG1057_1.jpg http://i57.photobucket.com/albums/g209/w00kie123/IMAG1057_1.jpg
So far I've used linear regression to find the line of best fit, y=mx+b and from this i know what the gradient and intercept is, however is this all the question is asking or have i missed something, in my opinion it is worded badly and am doubting myself on whether the whole question has been answered?

Cheers in advance

RUber · Mar 17, 2015

Did you calculate the resistance? Looks like it should be about 8.

Ramjam · Mar 17, 2015

RUber said:

Did you calculate the resistance? Looks like it should be about 8.

I havent, when you say 8 how did you get that?

RUber · Mar 17, 2015

Ohm's law.

Ramjam · Mar 17, 2015

RUber said:

Ohm's law.

It can't be that easy thought surly, i know ohms law but surly i would have to work it out using the line of best fit? or it would seem pointless for me to calculate it?

RUber · Mar 17, 2015

Based on Ohm's law, you should be able to find resistance as approximately 1/gradiant; I'll bet your intercept is close to zero. My guess is that you will still find resistance to be close to 8.

Ramjam · Mar 17, 2015

RUber said:

Based on Ohm's law, you should be able to find resistance as approximately 1/gradiant; I'll bet your intercept is close to zero. My guess is that you will still find resistance to be close to 8.

My intercept was 0.574 and my gradient 0.114, which comes out as 8.9 ish for resistance. this could be caused due to a rounding error, I am going to check my working and see if i can get the resistance closer to 8. Thanks for your help on this matter.

RUber · Mar 17, 2015

Those numbers are similar to what I got too. If you force the intercept to zero, which makes sense in this case since it is both physically realistic and within the CI of the intercept, you get closer to 8.

statdad · Mar 17, 2015

You don't need the resistance - that is worthless for this question.. All you need is the regression information - the slope and the intercept. The comment "Any errors in measurement are associated with measurement of the current" is intended to make the problem fit the usual assumptions in linear regression, where the x-values are constant and the y-values are random.

You should not force the intercept to be zero in a regression problem unless there are specific instructions to do so: with the intercept quantities like correlation and R-sq (which would likely be studied along with regression) are meaningless.

Ray Vickson · Mar 17, 2015

statdad said:

You don't need the resistance - that is worthless for this question.. All you need is the regression information - the slope and the intercept. The comment "Any errors in measurement are associated with measurement of the current" is intended to make the problem fit the usual assumptions in linear regression, where the x-values are constant and the y-values are random."

You should not force the intercept to be zero in a regression problem unless there are specific instructions to do so: with the intercept quantities like correlation and R-sq (which would likely be studied along with regression) are meaningless.

There might be an issue here: Ohm's Law says ##I = V/R##. Whether or not we can interpret ##b## (in ##I = a + bV##) as being ##1/R## seems a bit "iffy" to me. Certainly, the two fits ##I = bV## and ##I = a + bV## do give slightly different values of ##b## and hence two different estimates of resistance (which the OP was asked to find so is not worthless, at least from the point of view of assignment marks, if not of science).

statdad · Mar 17, 2015

" (which the OPwas asked to find)"
I did miss that part. Apologies.

I do stand by the "don't make the intercept zero" comment, for the reason stated. Looking at this as a regression problem in statistics, there is nothing in the statement indicating the intercept should be forced to zero. (I buy your physics explanation, but wouldn't apply the idea to this as the problem is written.)

RUber · Mar 17, 2015

Good point, @statdad. After taking a second look, it might be best to calculate the resistance for each trial and average them to find your best guess for resistance. The question is clearly not implying that the intercept should be zero, though one would expect it to be close.
@Ramjam, it seems unclear what the expectation is for how you should calculate the resistance. If this is a homework problem, I would ask for clarification from the instructor.
1/gradiant seems like a reasonable way to approximate it, averaging the trials is another. Both methods get an answer between 8 and 9, but if you are seeking points on HW- you will likely be expected to do it the teacher's way.

Ray Vickson · Mar 17, 2015

statdad said:

" (which the OPwas asked to find)"
I did miss that part. Apologies.

I do stand by the "don't make the intercept zero" comment, for the reason stated. Looking at this as a regression problem in statistics, there is nothing in the statement indicating the intercept should be forced to zero. (I buy your physics explanation, but wouldn't apply the idea to this as the problem is written.)

I did say I thought it was "iffy", and that the two ways give different answers.

I agree that if we are positing an underlying model as ##y = \alpha + \beta x + \epsilon## then the fit ##y = a + bx## makes sense, and leads to unbiased results for iid mean-0 ##\epsilon##. However, if a law of physics tells us that ##y = \beta x + \epsilon## is really the true equation, are we then justified in including an intercept? I honestly don't know. I suppose this issue must have been discusses for years in the Stats literature, but I do not know of relevant references.

statdad · Mar 18, 2015

Typically, unless there is a specific instruction to drop the intercept, it isn't done. It's also not indicated if the collected predictor values are not close to 0, as forcing the intercept to zero then can inflate the slope - not because the relationship indicates the slope should be large, but simply due to the fact that the line
needs to be "tilted" enough to hit the origin.

Ray Vickson · Mar 18, 2015

statdad said:

Typically, unless there is a specific instruction to drop the intercept, it isn't done. It's also not indicated if the collected predictor values are not close to 0, as forcing the intercept to zero then can inflate the slope - not because the relationship indicates the slope should be large, but simply due to the fact that the line
needs to be "tilted" enough to hit the origin.

I looked at the question "analytically" Suppose we have a true model of the form ##y = bx##, which get inflated to ##y_i = b x_i + e_i, i=1, \ldots, n## with iie mean-0 random errors ##e_i## having variance ##\sigma^2##. Now consider the two "fits": (1) zero-intercept fit ##\hat{y}_i = B_1 x_i, i=1, \ldots, n##; and (2) general linear fit ##\hat{y}_i = A_2 + B_2 x_i, i=1, \ldots, n##. Here, ##\hat{y}## denotes the fitted value corresponding to measured value ##y##.

The least-square fits to b in the two functional fits are:
[tex]B_1 = b + E_1, \; E_1 = \frac{\sum_i x_i e_i}{\sum_i x_i^2} \\<br /> B_2 = b + E_2, \; E_2 = \frac{n \sum_i x_i e_i - \sum_i x_i \: \sum_j e_j}{ n \sum_i x_i^2 - (\sum_i x_i)^2}[/tex]
Both ##B_1, B_2## are unbiased, but their variances are different. If ##S_1 = \sum_i x_i## and ##S_2 = \sum_i x_i^2## we have
[tex]\text{Var} B_1 = \frac{\sigma^2}{S_2}\\<br /> \text{Var} B_2 = \frac{n\, \sigma^2}{n S_2 - S_1^2} = \frac{\sigma^2}{S_2 - S_1^2/n}[/tex]
Thus, we always have ##\text{Var} B_2 > \text{Var} B_1##.

For the specific case of ##x_i = 40,40,80,80,120,120## we have standard deviations of ##B_1, B_2## as
[tex]\text{St_dev} \,B_1 = \sigma \,0.49875\:10^{-2}, \; \text{St-dev} \, B_2 = \sigma \, 0.14237\:10^{-1}[/tex]
Thus, the 0-intercept slope estimate is likely more accurate than the nonzero-intercept estimate in this case. Just to be clear: the 0-intercept fit is always (not just on average, but always) a worse fit to the data but on average gives a better fit to the slope. Of course, just by chance the 0-intercept estimate of the slope could also be worse in some individual cases, but on average it is better.

Note: this case very different from something like an economic series fit, where the "true" intercept is somewhat arbitrary (and may depend on how one defines the scale and location parameters of y). In the current case, having a true intercept of zero is essentially a law of nature and so does not have any arbitrariness in it.

statdad · Mar 18, 2015

The bit about the variance comes from the fact that regression with the intercept compares estimation of means by regression to estimation by the sample mean - which minimizes the sum of squares - as you show above, but in different form
[tex] \sum \left(x_i - \bar x\right)^2 = \min_{a} \sum \left(x_i - a\right)^2 \le \sum \left(x_i - 0\right)^2 = \sum x_i^2[/tex]

I do get the physics, even with as far in the past as my meagre physics exposure goes. (I want to be pressed neither on how long nor on any physics that is more sophisticated.) I still stay with my comment.

Not as support, but illustration, I offer the attached screen grab, from http://cw.routledge.com/textbooks/eresources/9780080965628/Ch_65_Linear_regression.pdf

(I must say I despise that author's use of X = a + bY in the second regression example.)

No offense intended in any of my comments. Sometimes that doesn't come across in these discussions.

Ray Vickson · Mar 18, 2015

statdad said:

The bit about the variance comes from the fact that regression with the intercept compares estimation of means by regression to estimation by the sample mean - which minimizes the sum of squares - as you show above, but in different form
[tex] \sum \left(x_i - \bar x\right)^2 = \min_{a} \sum \left(x_i - a\right)^2 \le \sum \left(x_i - 0\right)^2 = \sum x_i^2[/tex]

I do get the physics, even with as far in the past as my meagre physics exposure goes. (I want to be pressed neither on how long nor on any physics that is more sophisticated.) I still stay with my comment.

Not as support, but illustration, I offer the attached screen grab, from http://cw.routledge.com/textbooks/eresources/9780080965628/Ch_65_Linear_regression.pdf

(I must say I despise that author's use of X = a + bY in the second regression example.)

********************************************
I know that is the standard textbook treatment of such problems, and I have often done many such examples and analyses in class when I taught that material. However, I honestly believe it does not really get at the issue I am emphasising here, which relates to a linear model when you absolutely KNOW the true intercept = 0. I tried to hint at that in my final paragraph, but maybe it did not come across explicitly enough. I have also done some editing of the second-last paragraph of my previous post to elucidate more accurately the differences between the two fits.

Anyway, here is a summary of how I see things here. (1) We have a true model of the form ##y = b x## when there are no random errors, and get "measurements" of ##y## that are perturbed by addition of random errors. We want to estimate the value of ##b## by doing something to the measured data. (2) There are two methods available to us. (i) Method 1 uses a fitted formula of the form ##\hat{y}_i = B_1 x_i## for ##i = 1,2, \ldots,n## and determines ##B_1## via minimization of least total squared error. (ii) Method uses standard linear regression of the form ##\hat{y}_i = A_2 + B_2 x_i## and determines ##A_2, B_2## through a standard least-square method. Of course, the second method will always give a more accurate line fit to the ##(x,y)## data, but I am concerned with the accuracy in the estimate of slope rather than the accuracy of ##y## vs ##x##.

So, using two slightly different methods we obtain different estimates of the true slope ##b##. The gist of my previous post is that Method 1 gives a more accurate estimate of slope than Method 2 (on average). That is just a mathematical fact, and does not really have anything to do with physics. The physics informs the mathematics isamuch as it specifies the form of the true, underlying model, but after that physics is no longer involved.

***************************************************No offense intended in any of my comments. Sometimes that doesn't come across in these discussions.

No problem: I like a spirited discussion sometimes. Respectful disagreement can be valuable in clarifying issues.

Annoying stats question, think I'm answering it right?

Discussion

Attachments

"Critical" Triangle Problem

The optimal way of dividing the bet three ways

Hedging on a weather prediction

Solving an elementary trigonometric equation

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect