Confidence interval for estimated mean of (discrete) uniform distribution

In summary: For example, what are some of the assumptions that you would need to make in order to produce one? With that said, what I was asking for in the first post is, just as you said, the credible interval. However,... it would be helpful if you could elaborate a little more on what you mean by "the credible interval". For example, what are some of the assumptions that you would need to make in order to produce one?What do you mean by a "confidence interval"?I want to be able to give an interval within which I can with a given certainty (e.g. 95%) say that the
  • #1
Malle
8
0
Say that there is a random variable X ~ U(a,b) where U is the discrete uniform distribution on integers on the interval [a,b]. Sample n such variables with the same (unknown) parameters a and b. Using those samples it's possible to estimate the mean either by taking the sample mean (sum the value of each sample and divide by n), but how would I be able to calculate a confidence interval for the estimate of the mean?

(Note: It's also possible to calculate the mean using the sample mid-range, but since the data gathering is done manually, any error which puts a value outside the interval [a,b] would with the sample mid-range forever bias the estimate of the mean, regardless of how many samples are taken.)

For the data I use this for now the sample sizes are small, typically n≤10.

Example data:
220
238
241
200
204
271
289
273
243

Any solutions or guidance to how to approach this (and resources for relevant material) are highly appreciated.
 
Physics news on Phys.org
  • #2
No offense intended, but can we make sure that you are using the terminology "confidence interval" in the usual technical sense of that phrase. Some people say "confidence interval" when they really mean other things, such as a "prediction interval" or perhaps a Bayesian version of a confidence interval. What do you mean by a "confidence interval"?
 
  • #3
Stephen Tashi said:
No offense intended, but can we make sure that you are using the terminology "confidence interval" in the usual technical sense of that phrase. Some people say "confidence interval" when they really mean other things, such as a "prediction interval" or perhaps a Bayesian version of a confidence interval. What do you mean by a "confidence interval"?
None taken.

I want to be able to give an interval within which I can with a given certainty (e.g. 95%) say that the mean of the distribution (not the sample) lies.
 
  • #4
Malle said:
None taken.

I want to be able to give an interval within which I can with a given certainty (e.g. 95%) say that the mean of the distribution (not the sample) lies.

Assuming that I interpret "certainty" to mean "probability", do you want to be able to state a specific numerical interval based on an observed value taken from your data? - something like "There is a 95% probability that the mean of the distribution is between 1.8 and 3.8"?

If you want that, you are asking for a "credible interval" - at least according to the current Wikipedia article on "confidence interval". This is a natural thing to want, but you have to make enough assumptions to employ Bayesian statistics in order to get it.

In many fields of study, published papers use "confidence interval" by tradition. So if you are writing some kind of report, there is that consideration. We can discuss either approach, but "confidence intervals" do not have the same interpretation as "credible intervals".
 
  • #5
Stephen Tashi said:
Assuming that I interpret "certainty" to mean "probability", do you want to be able to state a specific numerical interval based on an observed value taken from your data? - something like "There is a 95% probability that the mean of the distribution is between 1.8 and 3.8"?

If you want that, you are asking for a "credible interval" - at least according to the current Wikipedia article on "confidence interval". This is a natural thing to want, but you have to make enough assumptions to employ Bayesian statistics in order to get it.

In many fields of study, published papers use "confidence interval" by tradition. So if you are writing some kind of report, there is that consideration. We can discuss either approach, but "confidence intervals" do not have the same interpretation as "credible intervals".
This is strictly for my own uses, so tradition is not an important aspect.

I read briefly in the articles on credible intervals and confidence intervals and just want to paste in the following (if for nothing else, so that I can access it easier in this thread):
A confidence interval with a particular confidence level is intended to give the assurance that, if the statistical model is correct, then taken over all the data that might have been obtained, the procedure for constructing the interval would deliver a confidence interval that included the true value of the parameter the proportion of the time set by the confidence level.
A confidence interval does not predict that the true value of the parameter has a particular probability of being in the confidence interval given the data actually obtained. (An interval intended to have such a property, called a credible interval, can be estimated using Bayesian methods; but such methods bring with them their own distinct strengths and weaknesses).


With that said, what I was asking for in the first post is, just as you said, the credible interval. However, upon reading this I realize that my question is partially based on the misconception that a confidence interval of an estimate of a parameter would contain the parameter with a probability given by the confidence level. I'm not sure if I misremember the one statistics class I've taken or if it was taught erroneously (or if it was just for some subset of possible problems where the two are the same).


If possible, I would very much like to be able to give a credible interval; presuming I understand the idea correctly, it is what feels most natural to me to describe the certainty of an estimation if there is only a single experiment done. However, the Wiki article says that
credible intervals incorporate problem-specific contextual information from the prior distribution whereas confidence intervals are based only on the data
and as far as I can tell, I have no certain prior distribution to work with. How would one arrive at a prior distribution? Could you select any prior distribution and work with it, with the only caveat that the posterior distribution would vary depending on how you chose your prior?


If it is not possible, or if it proves to be too complex for me to learn right now, then a confidence interval would work.


But, just to see if I understand the distinction between the two, if I used a confidence interval instead of a credible interval:
1. I could not say that with a confidence level of certainty the true value of the estimated parameter is within the confidence interval (e.g. that there's a 90% probability that the true value is within the confidence interval)
2. I could say that if there are a large number of different experiments run, and for each experiment a confidence interval is calculated, then a confidence level fraction of the confidence intervals would contain the true value. (e.g. for a 90% confidence level, 90% of the calculated confidence intervals would contain the true value).

Is that right?
 
  • #6
Malle said:
1. I could not say that with a confidence level of certainty the true value of the estimated parameter is within the confidence interval (e.g. that there's a 90% probability that the true value is within the confidence interval)
2. I could say that if there are a large number of different experiments run, and for each experiment a confidence interval is calculated, then a confidence level fraction of the confidence intervals would contain the true value. (e.g. for a 90% confidence level, 90% of the calculated confidence intervals would contain the true value).

Is that right?

That is essentially correct. There are actually two versions of "confidence interval". The most respectable version (in my opinion) is an interval which has a length but not a definite center or definite endpoints. For example one might say "the 90% confidence interval for the mean in this sampling plan is plus or minus 6.37". This relates to your number 2. situation. When people observe a specific sample mean, such as 5.3, some will also call the numerical interval (5.3 - 6.37, 5.3 + 6.37) a "90% confidence interval". (My old college statistics text says that this is done "by abuse of language".) This relates to you situation number 1 and , as you say, it is not correct to claim that there is a 90% probability that population mean is in a given interval with specific numerical endpoints.

To say something about the probability of the population mean being somewhere, one must admit the idea that this probability can exist. (Confidence intervals for the mean assume the population mean is in a "fixed but unknown" location and this contradicts the idea that there is anything probabilistic about it.) To compute a "credible interval" one must assume a distribution for the sample mean that applies before we observe the sample ( a so-called "prior distribution".) There are several approaches to estimating prior distributions.

The most comfortable situation is when you are dealing with a situation similar to ones that you have done before. For example, if were trying to find the mean concentration of gold in an ore sample, you might have assayed hundreds of other samples and you might use a histogram of those concentrations as the prior distribution.

If you are dealing with a one-of-a-kind situation, you must be willing to imagine some probability model where that situation is one realization of many possibilities. For example, if you were measuring the mass of Jupiter, you could think of the process that gave it its mass as being probabilistic. Such a probability model may suggest a prior distribution. If not, then one make take the approach of "maximum entropy". The general idea of that is to assume a prior distribution which has maximum uncertainty, in the technical sense of having maximum entropy. (A famous advocate of this approach was Edwin Jaynes and his book "Probability The Logic Of Science" is available on the web.)

There are cases when no prior distribution exits. For example, if you think "the mean is equally likely to be any real number", you can't define a uniform distribution on the real numbers. However, one can sometimes do philosophically suspicious things like defining a uniform distribution on [-n,n], finding a credible interval that is a function of n and then taking the limiting value of the interval as n approaches infinity.

Amusingly, in many problems, the Bayesian credible intervals turn out to be numerically the same interval as a confidence interval.
 
  • #7
There are probably texts or papers that work your problem in a Bayesian manner. Nevertheless it might be interesting to try solving it in a naive straightforward way without looking up such materials. I don't know if I can do this correctly, but I'll start.

Lets assume the population values are integers. Let [itex] A = [/itex] the minimum possible value and [itex] D = [/itex] the number of possible values. The distribution of the population given [itex] A [/itex] and [itex] D [/itex] is uniform, so it assigns a probability of [itex] \frac{1}{D} [/itex] to each integer in the interval [itex] [A,A+D-1] [/itex].

For a prior distribution of population parameters, assume [itex] A [/itex] is equally likely to be any integer between (or including) the given integers [itex] A_L [/itex] and [itex] A_U[/itex]. Assume [itex] D [/itex] is equally likely to be any integer between (and including) the given integers [itex] D_L [/itex] and [itex] D_U [/itex].

So for integers [itex] (a,d) [/itex] in the permitted range, [itex] P(A=a,D=d) = \frac{1}{A_U - A_L + 1} \frac{1}{D_U - D_L + 1} [/itex]
Denote this constant by [itex] P(A=a,D=d) = \lambda [/itex].

Let [itex] X [/itex] be the random variable representing a vector of [itex] n [/itex] independent samples [itex] X_1, X_2,...X_n [/itex] from the population.
Let [itex] X_{min} = min\{X1,X2,...X_n\} [/itex]
Let [itex] X_{max} = max\{X1,X2,...X_n\} [/itex]


[itex] P(X=x| A=a,D=d) = \frac{1}{d^n} [/itex] if [itex] x_{min} \ge a [/itex] and [itex] x_{max} \le a+d-1 [/itex]
[itex] P(X=x|A=a,D=d) = 0 [/itex] otherwise

[itex] P(X=x,A=a,D=d) = \frac{1}{d^n} \lambda [/itex] if [itex] A_L \le a \le x_{min} [/itex] and [itex] x_{max} - x_{min} + 1 \le d \le D_U [/itex]
[itex] P(X=x,A=a,D=d) = 0 [/itex] otherwise.


[itex] P(X=x) = \sum_{a=A_L}^{x_{min}} \sum_{d=x_{max} - x_{min} + 1}^{D_U} \frac{1}{d^n} \lambda [/itex]

The above summation shouldn't be worse than summing part of geometric series.

[itex] P(A=a,D=d| X=x) = \frac{ P(X=x,A=a,D=d)}{P(X=x)} [/itex]

I think the above shows that we can calculate numerically the conditional distribution the parameters [itex] A,D [/itex] given the sample.

It remains to calculate the conditional distribution of the mean [itex] \mu [/itex] given [itex]X [/itex], which is a function of those parameters. We can try that if you're interested. Or maybe it's simpler to look in book!
 
  • #8
Malle:

This discussion may be of interest. The basis for determining the confidence interval for a uniform distribution depends on the Central Limit Theorem. Whatever theoretical objections may be raised, confidence intervals are still widely used in scientific articles published in peer reviewed journals.

http://www.cc.gatech.edu/~lebanon/notes/confInt.pdf
 
  • #9
I've read both your posts and thank you for the assistance so far. I'm currently digesting it and trying to give it some more thought on my own, but I don't have very much time to spare at the moment. I will try to come back in a few days with comments and more questions.
 
  • #10
New reply since I cannot edit my last post:

It's become painfully obcious to me that I do not currently have the time to pursue this ("a few days"... pfft >_>)

Hopefully I will be able to pick this up again at a later point in time. Until then, thank you Stephen Tashi and SW VandeCarr for contributing.
 

What is a confidence interval?

A confidence interval is a range of values that is likely to contain the true value of a population parameter with a certain level of confidence.

What is a discrete uniform distribution?

A discrete uniform distribution is a probability distribution where all possible outcomes have an equal chance of occurring. It is often used to model situations where there is a fixed number of equally likely outcomes.

How is a confidence interval calculated for the estimated mean of a discrete uniform distribution?

To calculate the confidence interval for the estimated mean of a discrete uniform distribution, the sample mean and sample size are used to determine the standard error. The standard error is then multiplied by the appropriate critical value from the t-distribution based on the desired confidence level and degrees of freedom. The resulting range of values is the confidence interval.

What does the confidence level represent for a confidence interval?

The confidence level represents the percentage of times that the true population parameter would be expected to fall within the calculated confidence interval if the same sampling procedure were repeated a large number of times.

Why is the confidence interval important in statistics?

The confidence interval provides a range of values for the estimated population parameter that takes into account the uncertainty of sampling. It allows for a more accurate understanding of the true value of the population parameter and helps to make inferences about the population as a whole.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
280
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
668
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
615
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
926
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
672
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
2K
Back
Top