Variance (error bars) with a binomial proportion

In summary: Then the proportion of "D" tests that would be high would be higher than it would be if I were testing for something else.
  • #1
labrookie
5
0
I have a list of chemicals, their assay test results, and a binomial column of whether or not the assay test result was high enough to be considered a threat (anything >2g/ml). Some chemicals were tested more than once, but others were not. It is understood that it is a poor set of data, but I am trying to be as useful as I can with it. I would like to take into account the variance within the same chemical.

Portion of data:

Chemical Assay reading (g/ml) >2?
A .04 0
B 1.2 0
C 4.6 1
D 1.1 0
D 2.3 1
E .03 0
F .27 0
G .92 0
G 3.00 1
G 2.34 1
H 1.36 0
I .80 0
J .45 0
K 1.75 0
L 2.45 1
L 2.60 1
M 5.6 1
N 1.11 0
O 3.14 1
P 0.50 0
Q 1.15 0
Q 2.01 1
R 1.50 0
S .09 0
T .12 0


I am trying to simply calculation the proportion where the binomial column is 1. That part is easy, but I am also trying to inclue standard error or some form of the variance. How can I take into account the variance within a chemical tested more than once?
 
Physics news on Phys.org
  • #2
Your terminology isn't clear. Do you have a list of "chemicals" or a list of "samples of substances"? If you have a a list of "chemicals", what is being "assayed"? Wouldn't they all be 100% that chemical?

For example is "D" something like "cynanide" and are the "assay" results for "D", two tests run on two different samples of water?
 
  • #3
Yes, they are all different samples. The assay is for an amount of gas given off by each of them. "D" happens to be the same chemical that was sampled.

I am trying to show a proportion of the samples that are greater >2g/ml. Just showing the binomial variation may underestimate the total variation...because there is also variation when the sample chemical is tested more than once. How can I tie this variation into my proportion variation?
 
  • #4
It still isn't clear what real world quantity you are trying to estimate. For example, suppose I am trying to answer the nebulous question: "What is the probability that a randomly selected sample of water from my town contains dangerous levels of a chemical?". Even if I am careful to define "randomly selected" in some reasonable manner so that all sources of water are represented in proportion to the amount of water drawn from then, there is still the problem of which chemicals are selected for the the tests. I could bias the results by testing for one chemical more than another. For example, suppose the water in my town tends to be poluted by lead and I do most of my testing for radon.
 
  • #5


As a scientist, it is important to acknowledge the limitations of the data you are working with. In this case, it seems that the data is not ideal due to the variability in the number of times each chemical was tested and the small number of replicates for some chemicals. However, it is still possible to extract some useful information from this data set.

To address the issue of variance within a chemical tested more than once, you could calculate the standard error of the proportion by using the binomial distribution formula. This would take into account the variability in the data and provide a measure of the precision of the estimated proportion.

Additionally, you could also consider using confidence intervals, which would provide a range of values within which the true proportion is likely to fall. This would give a better understanding of the uncertainty associated with the estimated proportion.

Overall, it is important to acknowledge the limitations of the data and to use appropriate statistical methods to account for the variability in the data. This will ensure that any conclusions drawn from the data are reliable and accurate.
 

1. What is variance in relation to binomial proportions?

Variance is a statistical measure that describes the spread or variation of data points around the mean or average. In the context of binomial proportions, variance refers to the variability or uncertainty in the estimated proportion of successes in a binomial experiment.

2. How is variance calculated for binomial proportions?

The formula for calculating variance in binomial proportions is Var(p) = pq/n, where p is the estimated proportion of successes, q is the estimated proportion of failures (1-p), and n is the sample size. This formula represents the average squared difference between the observed and expected values of a binomial distribution.

3. Why is variance important for interpreting binomial proportions?

Variance is important because it reflects the reliability of the estimated proportion of successes in a binomial experiment. A low variance indicates a more precise estimate, while a high variance suggests a greater degree of uncertainty or variability in the data. This information can be used to assess the significance of the results and make informed decisions based on the data.

4. How are error bars related to variance in binomial proportions?

Error bars are graphical representations of variability in data. In the context of binomial proportions, error bars are typically calculated using the variance formula and are used to visually depict the uncertainty in the estimated proportion of successes. Wider error bars indicate a larger variance and more uncertainty in the data, while narrower error bars suggest a more precise estimate.

5. Can variance be used to compare binomial proportions between different groups?

Yes, variance can be used to compare binomial proportions between different groups. By calculating the variance for each group and comparing them, we can determine if there are significant differences in the estimated proportions of successes. However, it is important to note that variance alone may not provide enough information to make a conclusive comparison and other statistical tests may be necessary.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
912
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
814
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
Replies
6
Views
1K
Replies
1
Views
791
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
2K
Back
Top