How are error margins determined without using an integer for population?

In summary, the conversation discusses the relationship between population and sample size in error margin estimations. It is mentioned that in order to determine an error margin, one must have an estimate of the population size. However, in some cases, such as with infinite populations, this can be difficult or impossible. The use of probability distributions and hypothesis testing is also brought up in relation to determining error margins.
  • #1
mbahnshee
6
0
My first post on these forums, was referred here by a friend of mine. Thanks in advance!

Mind that this question is coming from a BA in psych's worth of understanding: If I understand correctly, estimations of error margins are based on the relationship between a population and a sample of it. What has always confused me is that for a numerical relationship to be defined between these two variables, an integer must have been used for the population. If, for example, the population were considered to be infinite, then no sample of it could be said bear any resemblance to the infinite whole and thus no error estimations of the likelihood that it did could be made. So what I'm wondering, if I've understood the applied statistics correctly thus far, is how the likelihood that a sample is representative of the population from which it is drawn is ultimately determined without knowing the population size.
 
Physics news on Phys.org
  • #2
If, for example, the population were considered to be infinite, then no sample of it could be said bear any resemblance to the infinite whole and thus no error estimations of the likelihood that it did could be made.

This is just not true. Think about polling to forecast election results, where the sample is much smaller than the population. The main point is to get a random sample.
 
  • #3
In election polling they know the population size though. I'm talking about determining error estimations when the population size is not known.
 
Last edited:
  • #4
If you have no idea what the population is, you cannot define a error margin. If you have an estimate for the population, you can make an estimate of the error margin.

If you really have an infinite population (each member of your population is a point on a line segment), you will need to define a specific probability distribution (you can't have "each point is equally likely" distribution on such a population). In that case, the "error margin" is dependent on the probability distribution.
 
  • #5
Wouldn't it be impossible to define a probability distribution for a truly infinite population? Or are you saying that you would just have to infer one to be able to continue?
 
  • #6
mbahnshee said:
Wouldn't it be impossible to define a probability distribution for a truly infinite population? Or are you saying that you would just have to infer one to be able to continue?

You should clarify whether you are using the word "population" to mean a set of discrete objects like persons or cars etc. or whether you are using it in the sense of mathematical statistics, where it does not necessarily have that meaning. The set of real numbers in the interval [0,1] is an infinite population from the mathematical point of view.

Since you don't profess to be a statistician, you should also clarify what you mean by an "error margin".
 
  • #7
Stephen Tashi said:
You should clarify whether you are using the word "population" to mean a set of discrete objects like persons or cars etc. or whether you are using it in the sense of mathematical statistics, where it does not necessarily have that meaning. The set of real numbers in the interval [0,1] is an infinite population from the mathematical point of view.

Since you don't profess to be a statistician, you should also clarify what you mean by an "error margin".

I think you've shed some light on what might be the discrepancy here. What I learned (more than I still remember) was from statistics in psychology and perhaps I was trying to apply the systems there to how I imagined they would be applied in a purely mathematical system.

I think what bothered me most is that in order to say, compare two samples and say with 95% certainty that they were of the same (or different) population we would use a t-test which required population standard deviation. Not knowing the actual population's SD, we computed it by estimating from a sample of that population. Now, I think it's taken into account that there is a degree of error in estimating that population's SD from a sample's, but the fact that we weren't ever given the N of the population we were estimating for is what irked me. It seemed to me that if we didn't know N, and it could be infinitely large (again, perhaps not for psych's purposes), then there would be no way of estimating SD of a population from a sample and thus no way of plugging in an accurate population SD or of using it to compare samples with any certainty.

Am I explaining myself clearly? Sorry if I'm not, I've been away from the terms for a while and am digging up something that has been buried in my mind for a while.
 
  • #8
mbahnshee said:
I think what bothered me most is that in order to say, compare two samples and say with 95% certainty that they were of the same (or different) population we would use a t-test which required population standard deviation.
No, you didn't say with 95% certainty that the populations were the same or different. The kind of "Hypothesis testing" you did does not quantify the probability that a certain hypothesis is true or false. It is based on quantifying the probability of the data given that we assume a certain hypothesis is true - or true with probability 1, if you wish.

If assuming the hypothesis is true implies that the probability of the data is below some arbitrary threshold, such as 5%, you "reject" the hypothesis. But this doesn't mean that there is a 95 % chance the hypothesis is false or a 5% chance it is true.

Not knowing the actual population's SD, we computed it by estimating from a sample of that population. Now, I think it's taken into account that there is a degree of error in estimating that population's SD from a sample's, but the fact that we weren't ever given the N of the population we were estimating for is what irked me.


The N of the population could have a numerical effect when it was small enough to contradict assumptions that the population has some continuous distribution, such as the normal distribuiton. A small population could also have a prounounced effect if you were sampling "without replacement".

It seemed to me that if we didn't know N, and it could be infinitely large (again, perhaps not for psych's purposes), then there would be no way of estimating SD of a population from a sample and thus no way of plugging in an accurate population SD or of using it to compare samples with any certainty.
I think the larger N is, the better most statistical tests work. They assume you are taking independent samples from a random variable defined by a probability distribution. The distribution assumed often encompasses more possible values than a finite population could contain.


Am I explaining myself clearly?

Yes, you're doing a reasonable job of it - better than many other posters we see!
 
  • #9
Stephen Tashi said:
No, you didn't say with 95% certainty that the populations were the same or different. The kind of "Hypothesis testing" you did does not quantify the probability that a certain hypothesis is true or false. It is based on quantifying the probability of the data given that we assume a certain hypothesis is true - or true with probability 1, if you wish.

If assuming the hypothesis is true implies that the probability of the data is below some arbitrary threshold, such as 5%, you "reject" the hypothesis. But this doesn't mean that there is a 95 % chance the hypothesis is false or a 5% chance it is true.

Absolutely right, I've glossed over such terms as "rejecting the null hypothesis" but their exact meaning certainly offers clarity in this circumstance.

Stephen Tashi said:
The N of the population could have a numerical effect when it was small enough to contradict assumptions that the population has some continuous distribution, such as the normal distribuiton.

So assumptions about the distribution curves of a population are just assumptions? Is the strength of this assumption an area where you might guess a softer science like Psychology might differ from the use of statistics in purely mathematical paradigm?
 
  • #10
mbahnshee said:
Absolutely right, I've glossed over such terms as "rejecting the null hypothesis" but their exact meaning certainly offers clarity in this circumstance.

So assumptions about the distribution curves of a population are just assumptions? Is the strength of this assumption an area where you might guess a softer science like Psychology might differ from the use of statistics in purely mathematical paradigm?

There is actually a probabilistic description of doing hypothesis testing and getting either a Type I or Type II error. These are written in conditional probability statements and clarify what it is to either accept the right hypothesis or accept a Type I/II error.

The easiest way to think about this is that when you do hypothesis testing with distributions like the normal, it is possible that even for a large sample where the variance of the estimator is rather small, that your true value is still outside the confidence interval that you obtain.

One way to think of this is intuitively is that the sample you have is not a really representative sample. You might have to double, quadruple, or even go an order of magnitude before you get a sample that is really representative of the population: many areas of statistics focus on that very problem and while there is a mathematical element to it, the biggest thing actually involves understanding a process that you are examining and the surrounding context of that process.

We can never be absolutely sure about any hypothesis, but we can take steps to increase confidence that we have something useful and its these things that are used together to help build solid arguments and help support a conclusion.

In terms of assumptions, there are different kinds of distributions.

You can distributions that are based on actual processes. Some examples of these include Binomial distributions for things like coin flipping or Poisson distributions (which are a special case of binomial) for anything involves rates (like cars going by in a given time period).

Then on top of these you have what are called sampling distributions. These distributions are used to do probability and statistics with samples. These are the things you will often use when you do a hypothesis test like estimating a range for the mean given a sample (and making assumptions) or testing if the two variances are statistically significantly different or the same.

The sampling distributions will make assumptions about the data but they don't really encode specific information about a process like the binomial distribution or others like that do.
 
  • #11
mbahnshee said:
Absolutely right, I've glossed over such terms as "rejecting the null hypothesis"

I think glossing over it's meaning is the only way to be impressed by the method!


So assumptions about the distribution curves of a population are just assumptions?
What would the alternative be? - a mathematical proof that a population must have a particular distribution curve ? If you make other assumptions, you could formulate such a proof, but there will always be assumptions because mathematics relies on assumptions. It deals with "If A then B", not with arguments that A must be the case in the real world.

Is the strength of this assumption an area where you might guess a softer science like Psychology might differ from the use of statistics in purely mathematical paradigm?

I don't see any difference in paradigms between how the type of statistics you see in introductory courses is applied to the various sciences. How to apply statistics is subjective and heavily influenced by tradition. Certain fields and certain scientific journals have developed certain traditions about what kind of statistics is persuasive to people in those fields. However, I don't see a simple way (such as consideration of actual population sizes) to explain why these cultural differences have arisen.

In my opinion, the traditional sort of statistics encourages a pre-computer approach to thinking about data analysis since it was developed in a pre-computer age. When an expert in a field begins thinking about applying that sort of statistics, he typically selects from methods that required simlified assuptions about the properties of the data. This approach deprives him of any ability to incorporate his expert knowledge about fine details. If Nature deems that a certain phenomenon is going to described by some remakably simple physical laws then omitting details can be surprisingly successful. If she hasn't then the expert needs to put the details in. Nowadays that's possible by using computer simulations.
 
  • #12
Thanks so much guys!
 

1. How is the sample size determined when calculating error margins without using an integer for population?

The sample size is determined by using a formula that takes into account the desired level of confidence, the margin of error, and the standard deviation of the population. This formula is known as the minimum sample size formula.

2. What is the margin of error and how is it calculated without using an integer for population?

The margin of error is the maximum amount that the sample statistic is expected to differ from the true population parameter. It is calculated by multiplying the critical value (determined by the desired level of confidence) by the standard deviation of the population, divided by the square root of the sample size.

3. How does the confidence level affect the determination of error margins without using an integer for population?

The confidence level is directly related to the margin of error. A higher confidence level requires a larger margin of error, which in turn requires a larger sample size to be calculated without using an integer for population. This is because a higher confidence level means there is a smaller chance of the sample statistic deviating from the true population parameter.

4. Can error margins be determined accurately without using an integer for population?

Yes, error margins can be determined accurately without using an integer for population by using statistical methods and formulas such as the minimum sample size formula and margin of error formula. These methods take into account the desired level of confidence, margin of error, and standard deviation to calculate the appropriate sample size.

5. What are some limitations of determining error margins without using an integer for population?

Some limitations of determining error margins without using an integer for population include the assumption that the sample is representative of the entire population, and the potential for bias or error in the data collection process. Additionally, it may be more difficult to accurately determine error margins for smaller populations or populations with unknown standard deviations.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Classical Physics
Replies
18
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
2K
  • Beyond the Standard Models
Replies
19
Views
5K
Replies
9
Views
1K
Replies
4
Views
1K
  • High Energy, Nuclear, Particle Physics
Replies
9
Views
2K
  • Sci-Fi Writing and World Building
Replies
31
Views
2K
Back
Top