Margin of error in a t-distribution

In summary, the conversation discusses the concept of margin of error and the different statistical methods used to calculate it. The individual seeking assistance is trying to estimate the coefficient of friction and is considering both frequentist and Bayesian statistics. The conversation also touches on the potential issues with using a single t-value for different sample sizes and suggests using a combination of z and t-distributions to determine the necessary sample size.
  • #1
ND3G
79
0
This is not a homework problem.

I am working on an experiment and I need to know how many samples (n) I need to achieve a margin of error (e) below 2%.

Looking through a statistics textbook they provide a calculation for e using z-distributions, but not t-distributions.

Replacing variables I concluded that e = (ta/2S)2/[itex]\sqrt{}n[/itex] where ta/2 is the upper bound, S is the sample standard deviation.
Is is correct? Also, if so, is the value of e given as a percentage?

Lastly, from some preliminary tests (2 tests), the closer the initial tests results are to each other the smaller the error value (obviously). But I am concerned that a sample size of two is simply too small to definitively conclude that I am safely within my desired margin or error.

I need to conduct tests under a variety of conditions and the final number of tests performed may run into the hundreds, if not thousands, so it is vital that I do not perform more tests for any particular set of conditions that absolutely necessary. Any advice in this regard would be greatly appreciated.

Thanks in advance.
 
Physics news on Phys.org
  • #2
What is your definition of "margin of error"?

Do you want to say something like "the estimated value, based on the sample, is 28.93 and there is a 95% chance that the true value is within plus or minus 1.30 of this value"? Then give up, if you are using ordinary ("frequentist") statistics. It won't tell you that.

You can get something called a "confidence" interval from frequentist statistics. It doesn't tell you the probability that the true value (of whatever your are estimating) is in a specific numerical interval.

You didn't say what your are estimating and what estimator you are using. Once those are specified, you can estimate the standard deviation of the estimator. Some people call an interval that is plus or minus two (or three or four) standard deviations around the estimated value, the "margin of error". Is that what you mean?
 
Last edited:
  • #3
Stephen Tashi said:
What is your definition of "margin of error"?

Do you want to say something like "the estimated value, based on the sample, is 28.93 and there is a 95% chance that the true value is within plus or minus 1.30 of this value"?

That is exacting what I am trying to say. If "frequentist" statistics are not the right route what theories/methods should I be looking at?

I am trying to find the coefficient of friction, and I plan to use the sample mean as the estimator.
 
Last edited:
  • #4
ND3G said:
That is exacting what I am trying to say. If "frequentist" statistics are not the right route what theories/methods should I be looking at?

If you want that kind of statement, you should look at Bayesian statistics and "prediction intervals". The mathematical facts of life are that unless you are willing to hypothesize a distribution for the thing you are estimating prior to analyzing the data, it is impossible to quantify a probability distribution for that thing after you have the data. (This is analgous to the fact that you can't find the sides and angles of a triangle when you are given only one side and one angle. It isn't a matter of philosophy. It's just the nature of what constitutes sufficient information to solve the problem.)

However, if you are thinking about publishing a report, consider that there are some areas of engineering and science where frequentist statistics is traditional. Frequentist statistics emphasizes "confidence intervals" for estimators. A confidence interval approach can make a statement like "When we base our estimate on 50 samples, there is a 95% probability the the true value will be within plus or minus 1.3 of our estimate." This statement is similar to what you want, but it cannot be applied to a particular estimate, such as 28.93. Laymen often incorrectly apply it to read like the statement you want.

Bayesian and frequentist statistical methods often use substantially the same formulae. There is a distinct difference in the problems they are solving with these formulae.
 
  • #5
Thank you for your help. It is greatly appreciated.
 
  • #6
""When we base our estimate on 50 samples, there is a 95% probability the the true value will be within plus or minus 1.3 of our estimate."

Except it is not stated that way, as this makes it seem the true value is the quantity that is random.
"When we repeat this process a large number of times, and create a confidence interval each time, 95% of those intervals will fall around the true value" is the appropriate interpretation. Notice that this does not attach any information to a specific instance of an interval.
 
  • #7
A final comment: "Replacing variables I concluded that e = (ta/2S)2/√n where ta/2 is the upper bound, S is the sample standard deviation.
Is is correct? Also, if so, is the value of e given as a percentage?"

won't work. In order to know which t-value to use in this formula you need to select a number of degrees of freedom: as soon as you do that you've selected a sample size. The original formula uses z because (say for 95% confidence) a single z value suffices for every sample size. The downside: you must assume normality (as you do even for a t-interval)
 
  • #8
I did some research and one study suggested taking a couple samples, finding the minimum number of samples for a z-distribution, then use that value of n in determining the degrees of freedom for the t-distribution. Then I solve for the new n. with each new sample everything is recalulated, S, n for z-dis and n for the t-dis.
 
  • #9
ND3G said:
This is not a homework problem.

I am working on an experiment and I need to know how many samples (n) I need to achieve a margin of error (e) below 2%.

Looking through a statistics textbook they provide a calculation for e using z-distributions, but not t-distributions.

Replacing variables I concluded that e = (ta/2S)2/[itex]\sqrt{}n[/itex] where ta/2 is the upper bound, S is the sample standard deviation.
Is is correct? Also, if so, is the value of e given as a percentage?

Lastly, from some preliminary tests (2 tests), the closer the initial tests results are to each other the smaller the error value (obviously). But I am concerned that a sample size of two is simply too small to definitively conclude that I am safely within my desired margin or error.

I need to conduct tests under a variety of conditions and the final number of tests performed may run into the hundreds, if not thousands, so it is vital that I do not perform more tests for any particular set of conditions that absolutely necessary. Any advice in this regard would be greatly appreciated.

Thanks in advance.

I can't tell whether you have exactly two data points, or two runs with n data points. If n is 30 or more then you have nothing to worry about. If n is two then you have a lot to worry about.

The problem is that with n=2 the sample standard deviation is likely to be quite an inaccurate estimate of the standard deviation of the population. So you have an additional uncertainty.

I don't know whether in your many runs you can assume that the population standard deviation is close to the same in each run. If you can then you can use a pooled sample standard deviation and your problems are over. If you can't and your n for each possible population standard deviation is low then you need to use the t distribution instead of z.
 
  • #10
ND3G said:
I did some research and one study suggested taking a couple samples, finding the minimum number of samples for a z-distribution, then use that value of n in determining the degrees of freedom for the t-distribution. Then I solve for the new n. with each new sample everything is recalulated, S, n for z-dis and n for the t-dis.

There are various statistical techniques that have the word "sequential" in their name such as "sequential sampling" and "sequential estimation". What you read sounds like an application of such a procedure.

We really should get straight what your goal is. If you are writing a formal report about the coefficient of friction and the audience expects certain statistical techniques and would be made uncomfortable by others, the obvious course of action is to use the techniques they expect. If you are doing a student project on the coefficient of friction and getting distracted by a fascination with statistics, then you have to decide how much time you can devote to learning statistical techniques before the report is due.

To me, a more interesting application of probability to the coefficient of friction would be in stochastic modeling of the coefficient of friction, but might be too big a digression from your task, whatever that task is.
 
  • #11
This is a school project. The statistical analysis is simply an attempt to show that some thought went into the sample size and it was not chosen randomly.
 

1. What is a t-distribution and how does it relate to margin of error?

A t-distribution is a statistical distribution used to estimate the population mean when the sample size is small or when the population standard deviation is unknown. It is often used in place of the normal distribution when the sample size is less than 30. The margin of error is a measure of the amount of error in an estimate or prediction, and it is affected by the shape of the distribution. The t-distribution is used to calculate the margin of error when the sample size is small and the population standard deviation is unknown.

2. How is margin of error calculated in a t-distribution?

The margin of error in a t-distribution is calculated using the formula: ME = t* (s / √n), where ME is the margin of error, t is the critical value from the t-distribution table, s is the sample standard deviation, and n is the sample size. The critical value of t is based on the desired confidence level and degrees of freedom, which is calculated using the sample size minus one.

3. What factors affect the margin of error in a t-distribution?

The margin of error in a t-distribution is affected by the sample size, the confidence level, and the variability of the data. As the sample size increases, the margin of error decreases, meaning the estimate is more precise. As the confidence level increases, the margin of error also increases, meaning the estimate is less precise but more confident. If the data has a high degree of variability, the margin of error will also be higher.

4. What is the significance of the confidence level in a t-distribution?

The confidence level in a t-distribution represents the probability that the true population mean falls within the calculated margin of error. For example, a confidence level of 95% means that if the study was repeated 100 times, 95 of those times the true population mean would fall within the calculated margin of error. A higher confidence level indicates a higher level of confidence in the estimate, but it also results in a wider margin of error.

5. How is the t-distribution used in hypothesis testing?

The t-distribution is used in hypothesis testing to determine the likelihood of obtaining a particular sample mean if the null hypothesis is true. The t-statistic is calculated by dividing the difference between the sample mean and the hypothesized population mean by the standard error of the mean. The t-value is then compared to a critical value from the t-distribution table to determine the statistical significance of the results. If the t-value is greater than the critical value, the null hypothesis is rejected and the alternative hypothesis is accepted.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
880
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
436
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
717
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
804
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
969
  • Set Theory, Logic, Probability, Statistics
Replies
12
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
18
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
Back
Top