Question on Probability - been way too long since college

In summary: I'm not sure how that applies to this table, but because there is the variable num_cont, I figured that represented the control subjects. That, and because it was always non-zero.But, maybe I'm wrong. Maybe I'm over-thinking this question and the associated data. But, I tried an alternate calculation and it yielded a probability of 1 / (1 + 0) = 1; a probability of 1.Clearly, my calculation is wrong. And, my presumption is probably wrong, even though the table data supports it. Maybe there can be a situation where num_case > 0 and num_cont = 0, but it never happens in the table. I'm sure that there is
  • #1
ckirmser
105
3
Summary: Probability of an event based on a data table

Good morning, all -

I'm working on a question involving obesity based on alcohol and tobacco consumption. The question is based on a table with five variables;

• (age) An age group (10-25, 26-50, 51-75, 76+)
• (alc) An alcohol consumption group in g/day (0-40, 41-80, 80-120, 121+)
• (tob) A tobacco consumption group in g/day (0-10, 11-20, 21-30, 31+)
• (num_case) A number of obese cases (X)
• (num_cont) A number of controls (Y)

The question is, "What is the probability that a subject in the highest alcohol consumption group is obese?"

I figured the answer would be to first select only those rows in the table where alc = "121+". Then, from those, sum the num_case entries and divide that by the sum of num_case entries and the sum of the num_cont entries. In pseudocode;

WHERE alc = "121+"
SUM(num_case) / (SUM(num_case) + SUM(num_cont))

Apparently, this is not the answer, but I can't think of what else it might be.

So, I was hoping someone here might be able to clear this mental roadblock for me.

Thanx in advance!
 
Physics news on Phys.org
  • #2
ckirmser said:
Summary: Probability of an event based on a data table

Good morning, all -

I'm working on a question involving obesity based on alcohol and tobacco consumption. The question is based on a table with five variables;

• (age) An age group (10-25, 26-50, 51-75, 76+)
• (alc) An alcohol consumption group in g/day (0-40, 41-80, 80-120, 121+)
• (tob) A tobacco consumption group in g/day (0-10, 11-20, 21-30, 31+)
• (num_case) A number of obese cases (X)
• (num_cont) A number of controls (Y)

The question is, "What is the probability that a subject in the highest alcohol consumption group is obese?"

I figured the answer would be to first select only those rows in the table where alc = "121+". Then, from those, sum the num_case entries and divide that by the sum of num_case entries and the sum of the num_cont entries. In pseudocode;

WHERE alc = "121+"
SUM(num_case) / (SUM(num_case) + SUM(num_cont))

Apparently, this is not the answer, but I can't think of what else it might be.

So, I was hoping someone here might be able to clear this mental roadblock for me.

Thanx in advance!
What happens with your calculation if there is only one subject in the highest alcolol group and they are obese?
 
  • #3
I made the presumption - since there was no guidance otherwise and the table data bears this out - that the num_cont value is never 0; there is always at least one control.

So, presumably, your scenario couldn't happen. If there is only one subject in a row, that subject must be a control value because num_cont != 0, not an obese value. But, if it can happen, then my calculation would yield 1 / (1 + 0) = 1; a probability of 1.

Obviously, my calculation is wrong. And, my presumption is probably wrong, even though the table data supports it. Maybe there can be a situation where num_case > 0 and num_cont = 0, but it never happens in the table. I'm sure that there is some formula necessary to determine this, but I've been searching for two days and have yet to stumble upon the proper wording for the search to yield it.

Thanx for your reply, PeroK!
 
  • #4
ckirmser said:
I made the presumption - since there was no guidance otherwise and the table data bears this out - that the num_cont value is never 0; there is always at least one control.

So, presumably, your scenario couldn't happen. If there is only one subject in a row, that subject must be a control value because num_cont != 0, not an obese value. But, if it can happen, then my calculation would yield 1 / (1 + 0) = 1; a probability of 1.

Obviously, my calculation is wrong. And, my presumption is probably wrong, even though the table data supports it. Maybe there can be a situation where num_case > 0 and num_cont = 0, but it never happens in the table. I'm sure that there is some formula necessary to determine this, but I've been searching for two days and have yet to stumble upon the proper wording for the search to yield it.

Thanx for your reply, PeroK!

Forgive my ignorance, but what's a "control"?
 
  • #5
A control is something involved in a test that does not have whatever is being tested applied to it.

For example, if one is testing a new drug, controls are those subjects who are not given the drug; they are there to see what happens if nothing is done, against whom the test subjects - the ones receiving the drug - are compared.

I'm not sure how that applies to this table, but because there is the variable num_cont, I figured that represented the control subjects. That, and because it was always non-zero.

But, maybe I'm wrong. Maybe I'm over-thinking this question and the associated data. But, I tried an answer using num_cont as the population. In that case, I used;

WHERE alc = "121+"
SUM(num_case) / SUM(num_cont)

But, that gave me the wrong answer, too. And, because the num_cont value is never zero, it would still result in some real number.

So, I'm still lost. I'm sure the answer is in a Statistics book somewhere, but I sold mine back to the school after my classes were over and that was two decades ago - I've slept since then...
 
  • #6
ckirmser said:
A control is something involved in a test that does not have whatever is being tested applied to it.

What does that mean in this context?
 
  • #7
PS my limited understanding of statistical analysis is:

You have a group of people with certain factors: e.g. heavy drinkers. And you have a "control" group who do not have that factor.

You can then calculate the probability that a heavy dinker is obese, say, and the probablity that a non-heavy-drinker is obese and compare the two.

What I understand you are doing is counting the control group (of non drinkers) in with the heavy drinkers?
 
  • #8
Well, since the question is asking what is the probability of someone in the highest alcohol group being obese, I figure that is the number of obese in the group divided by the population (the obese and not obese) of that group. Like if I had 5 red straws and 10 not red straws, the probability of being a red straw is 5/15.

Are you thinking that the answer is to take those in the designated alcohol group and find their probability against the population of the entire table?

Hmm. Well, I can give it a shot...
 
  • #9
ckirmser said:
Well, since the question is asking what is the probability of someone in the highest alcohol group being obese, I figure that is the number of obese in the group divided by the population (the obese and not obese) of that group. Like if I had 5 red straws and 10 not red straws, the probability of being a red straw is 5/15.

Are you thinking that the answer is to take those in the designated alcohol group and find their probability against the population of the entire table?

Hmm. Well, I can give it a shot...

Are you saying that "control" in this context means "not obese"?
 
  • #10
That was the guess I took, based on the names of the variables. But, honestly, I don't know for certain what it means and I have no way to ask whoever made the table.

That's maybe why I'm having such a problem; a lack of information on what the table represents. I had figured someone familiar with statistics would recognize what I'm asking at a glance and just rattle off the formula to get the answer. I'm sure it's something simple, I'm just not seeing it.

Given further thought, that must be the case; num_case is a count of those who are obese and meet the other conditions of age, alcohol and tobacco consumption, and num_cont is a count of those who are not obese. So, to get a probability, I have to divide by the sum of both variables, because that's the population.

I'm missing something else somewhere, but don't know what.
 
  • #11
OK, I was right. After beating myself over the head for two days, I discovered that my code was not summing the data properly.

Rather than summing just the filtered data, it was summing the num_case and num_cont for the entire table. But, I think that's an issue with my session of Visual Studio Code, because the syntax is correct. I had to break the thing down into individual steps and manually run it to find the problem.

I'm going to try this out in RStudio and see if that fixes it.

I really appreciate your time, PeroK.
 
  • Like
Likes PeroK
  • #12
PeroK said:
What does that mean in this context?
Usually it is someone who is not given the treatment but who's outcome of interest is observed. There is (always, I think) a form of " blinding" in that either the subject , the experimenter or both does not know ahead of time which subject is being controlled. Maybe the clearest example is someone given a medical treatment to test whether it has some form of effect, say weight loss. Then some subject will not be given the treatment and it will be observed whether they lost weight. This helps determine a psychological/placebo effect , the psychological component of the effect.
 
  • #13
ckirmser, areyou familiar with conditional probabilities? I think you can frame this question in those terms.
 

1. What is probability and how is it calculated?

Probability is a measure of the likelihood of an event occurring. It is typically calculated by dividing the number of favorable outcomes by the total number of possible outcomes.

2. How is probability used in everyday life?

Probability is used in everyday life to make informed decisions and predictions. For example, it can be used in weather forecasts, sports betting, and risk assessment in insurance.

3. What are the different types of probability?

The three main types of probability are theoretical probability, experimental probability, and subjective probability. Theoretical probability is based on mathematical calculations, experimental probability is based on actual experiments or observations, and subjective probability is an individual's personal belief or estimation of likelihood.

4. Can probability be greater than 1 or less than 0?

No, probability cannot be greater than 1 or less than 0. A probability of 1 means the event is certain to occur, while a probability of 0 means the event is impossible.

5. How is probability related to statistics?

Probability is a fundamental concept in statistics. It is used to calculate the likelihood of outcomes and make predictions based on data. Probability is also used in statistical tests and analysis to determine the significance of results.

Similar threads

  • Math Proof Training and Practice
4
Replies
105
Views
12K
  • Biology and Chemistry Homework Help
Replies
2
Views
4K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
8
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
5
Views
3K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
7
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
9
Views
2K
Back
Top