Variance (error bars) with a binomial proportion

Click For Summary

Discussion Overview

The discussion revolves around calculating the variance and standard error for a binomial proportion derived from assay test results of various chemicals. Participants explore how to account for variance within samples of the same chemical that have been tested multiple times, while also addressing the challenges of interpreting the data given its limitations.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant seeks to calculate the proportion of samples exceeding a threshold (>2g/ml) while incorporating standard error or variance, particularly for chemicals tested multiple times.
  • Another participant questions the clarity of terminology, asking whether the data represents different chemicals or samples of the same chemical, and what exactly is being assayed.
  • A later reply clarifies that the samples are different and that the assay measures gas emissions from each chemical, emphasizing the need to consider variation across multiple tests of the same chemical.
  • Further, a participant raises concerns about the real-world implications of the data, suggesting that biases in sample selection could affect the interpretation of the results.

Areas of Agreement / Disagreement

Participants express differing views on the clarity of the data and the implications of sample selection bias. There is no consensus on how to properly account for variance in the calculations or the real-world significance of the results.

Contextual Notes

Participants note limitations regarding the clarity of the data and the potential biases in sample selection, which may affect the interpretation of the results and the calculations of variance.

labrookie
Messages
5
Reaction score
0
I have a list of chemicals, their assay test results, and a binomial column of whether or not the assay test result was high enough to be considered a threat (anything >2g/ml). Some chemicals were tested more than once, but others were not. It is understood that it is a poor set of data, but I am trying to be as useful as I can with it. I would like to take into account the variance within the same chemical.

Portion of data:

Chemical Assay reading (g/ml) >2?
A .04 0
B 1.2 0
C 4.6 1
D 1.1 0
D 2.3 1
E .03 0
F .27 0
G .92 0
G 3.00 1
G 2.34 1
H 1.36 0
I .80 0
J .45 0
K 1.75 0
L 2.45 1
L 2.60 1
M 5.6 1
N 1.11 0
O 3.14 1
P 0.50 0
Q 1.15 0
Q 2.01 1
R 1.50 0
S .09 0
T .12 0


I am trying to simply calculation the proportion where the binomial column is 1. That part is easy, but I am also trying to inclue standard error or some form of the variance. How can I take into account the variance within a chemical tested more than once?
 
Physics news on Phys.org
Your terminology isn't clear. Do you have a list of "chemicals" or a list of "samples of substances"? If you have a a list of "chemicals", what is being "assayed"? Wouldn't they all be 100% that chemical?

For example is "D" something like "cynanide" and are the "assay" results for "D", two tests run on two different samples of water?
 
Yes, they are all different samples. The assay is for an amount of gas given off by each of them. "D" happens to be the same chemical that was sampled.

I am trying to show a proportion of the samples that are greater >2g/ml. Just showing the binomial variation may underestimate the total variation...because there is also variation when the sample chemical is tested more than once. How can I tie this variation into my proportion variation?
 
It still isn't clear what real world quantity you are trying to estimate. For example, suppose I am trying to answer the nebulous question: "What is the probability that a randomly selected sample of water from my town contains dangerous levels of a chemical?". Even if I am careful to define "randomly selected" in some reasonable manner so that all sources of water are represented in proportion to the amount of water drawn from then, there is still the problem of which chemicals are selected for the the tests. I could bias the results by testing for one chemical more than another. For example, suppose the water in my town tends to be poluted by lead and I do most of my testing for radon.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 1 ·
Replies
1
Views
2K
Replies
1
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 8 ·
Replies
8
Views
2K
Replies
8
Views
2K
Replies
6
Views
7K
  • · Replies 5 ·
Replies
5
Views
3K