Question on Probability - been way too long since college

Click For Summary

Homework Help Overview

The discussion revolves around a probability question related to obesity, specifically examining the relationship between alcohol and tobacco consumption based on a provided data table. The table includes variables such as age groups, alcohol consumption levels, tobacco consumption levels, the number of obese cases, and the number of controls.

Discussion Character

  • Exploratory, Assumption checking, Problem interpretation

Approaches and Questions Raised

  • Participants explore the calculation of probability for obesity within the highest alcohol consumption group, questioning the correct method to sum cases and controls. There is discussion about the implications of having zero controls and how that affects the probability calculation.

Discussion Status

Participants are actively engaging with the problem, raising questions about the definitions of controls and the assumptions made regarding the data. Some have suggested alternative interpretations of the data and calculations, while others express uncertainty about the correct approach.

Contextual Notes

There is a presumption that the number of controls is never zero, which is being questioned. Participants are also grappling with the lack of clarity regarding the definitions of the variables in the context of the problem.

ckirmser
Messages
105
Reaction score
3
Summary: Probability of an event based on a data table

Good morning, all -

I'm working on a question involving obesity based on alcohol and tobacco consumption. The question is based on a table with five variables;

• (age) An age group (10-25, 26-50, 51-75, 76+)
• (alc) An alcohol consumption group in g/day (0-40, 41-80, 80-120, 121+)
• (tob) A tobacco consumption group in g/day (0-10, 11-20, 21-30, 31+)
• (num_case) A number of obese cases (X)
• (num_cont) A number of controls (Y)

The question is, "What is the probability that a subject in the highest alcohol consumption group is obese?"

I figured the answer would be to first select only those rows in the table where alc = "121+". Then, from those, sum the num_case entries and divide that by the sum of num_case entries and the sum of the num_cont entries. In pseudocode;

WHERE alc = "121+"
SUM(num_case) / (SUM(num_case) + SUM(num_cont))

Apparently, this is not the answer, but I can't think of what else it might be.

So, I was hoping someone here might be able to clear this mental roadblock for me.

Thanx in advance!
 
Physics news on Phys.org
ckirmser said:
Summary: Probability of an event based on a data table

Good morning, all -

I'm working on a question involving obesity based on alcohol and tobacco consumption. The question is based on a table with five variables;

• (age) An age group (10-25, 26-50, 51-75, 76+)
• (alc) An alcohol consumption group in g/day (0-40, 41-80, 80-120, 121+)
• (tob) A tobacco consumption group in g/day (0-10, 11-20, 21-30, 31+)
• (num_case) A number of obese cases (X)
• (num_cont) A number of controls (Y)

The question is, "What is the probability that a subject in the highest alcohol consumption group is obese?"

I figured the answer would be to first select only those rows in the table where alc = "121+". Then, from those, sum the num_case entries and divide that by the sum of num_case entries and the sum of the num_cont entries. In pseudocode;

WHERE alc = "121+"
SUM(num_case) / (SUM(num_case) + SUM(num_cont))

Apparently, this is not the answer, but I can't think of what else it might be.

So, I was hoping someone here might be able to clear this mental roadblock for me.

Thanx in advance!
What happens with your calculation if there is only one subject in the highest alcolol group and they are obese?
 
I made the presumption - since there was no guidance otherwise and the table data bears this out - that the num_cont value is never 0; there is always at least one control.

So, presumably, your scenario couldn't happen. If there is only one subject in a row, that subject must be a control value because num_cont != 0, not an obese value. But, if it can happen, then my calculation would yield 1 / (1 + 0) = 1; a probability of 1.

Obviously, my calculation is wrong. And, my presumption is probably wrong, even though the table data supports it. Maybe there can be a situation where num_case > 0 and num_cont = 0, but it never happens in the table. I'm sure that there is some formula necessary to determine this, but I've been searching for two days and have yet to stumble upon the proper wording for the search to yield it.

Thanx for your reply, PeroK!
 
ckirmser said:
I made the presumption - since there was no guidance otherwise and the table data bears this out - that the num_cont value is never 0; there is always at least one control.

So, presumably, your scenario couldn't happen. If there is only one subject in a row, that subject must be a control value because num_cont != 0, not an obese value. But, if it can happen, then my calculation would yield 1 / (1 + 0) = 1; a probability of 1.

Obviously, my calculation is wrong. And, my presumption is probably wrong, even though the table data supports it. Maybe there can be a situation where num_case > 0 and num_cont = 0, but it never happens in the table. I'm sure that there is some formula necessary to determine this, but I've been searching for two days and have yet to stumble upon the proper wording for the search to yield it.

Thanx for your reply, PeroK!

Forgive my ignorance, but what's a "control"?
 
A control is something involved in a test that does not have whatever is being tested applied to it.

For example, if one is testing a new drug, controls are those subjects who are not given the drug; they are there to see what happens if nothing is done, against whom the test subjects - the ones receiving the drug - are compared.

I'm not sure how that applies to this table, but because there is the variable num_cont, I figured that represented the control subjects. That, and because it was always non-zero.

But, maybe I'm wrong. Maybe I'm over-thinking this question and the associated data. But, I tried an answer using num_cont as the population. In that case, I used;

WHERE alc = "121+"
SUM(num_case) / SUM(num_cont)

But, that gave me the wrong answer, too. And, because the num_cont value is never zero, it would still result in some real number.

So, I'm still lost. I'm sure the answer is in a Statistics book somewhere, but I sold mine back to the school after my classes were over and that was two decades ago - I've slept since then...
 
ckirmser said:
A control is something involved in a test that does not have whatever is being tested applied to it.

What does that mean in this context?
 
PS my limited understanding of statistical analysis is:

You have a group of people with certain factors: e.g. heavy drinkers. And you have a "control" group who do not have that factor.

You can then calculate the probability that a heavy dinker is obese, say, and the probability that a non-heavy-drinker is obese and compare the two.

What I understand you are doing is counting the control group (of non drinkers) in with the heavy drinkers?
 
Well, since the question is asking what is the probability of someone in the highest alcohol group being obese, I figure that is the number of obese in the group divided by the population (the obese and not obese) of that group. Like if I had 5 red straws and 10 not red straws, the probability of being a red straw is 5/15.

Are you thinking that the answer is to take those in the designated alcohol group and find their probability against the population of the entire table?

Hmm. Well, I can give it a shot...
 
ckirmser said:
Well, since the question is asking what is the probability of someone in the highest alcohol group being obese, I figure that is the number of obese in the group divided by the population (the obese and not obese) of that group. Like if I had 5 red straws and 10 not red straws, the probability of being a red straw is 5/15.

Are you thinking that the answer is to take those in the designated alcohol group and find their probability against the population of the entire table?

Hmm. Well, I can give it a shot...

Are you saying that "control" in this context means "not obese"?
 
  • #10
That was the guess I took, based on the names of the variables. But, honestly, I don't know for certain what it means and I have no way to ask whoever made the table.

That's maybe why I'm having such a problem; a lack of information on what the table represents. I had figured someone familiar with statistics would recognize what I'm asking at a glance and just rattle off the formula to get the answer. I'm sure it's something simple, I'm just not seeing it.

Given further thought, that must be the case; num_case is a count of those who are obese and meet the other conditions of age, alcohol and tobacco consumption, and num_cont is a count of those who are not obese. So, to get a probability, I have to divide by the sum of both variables, because that's the population.

I'm missing something else somewhere, but don't know what.
 
  • #11
OK, I was right. After beating myself over the head for two days, I discovered that my code was not summing the data properly.

Rather than summing just the filtered data, it was summing the num_case and num_cont for the entire table. But, I think that's an issue with my session of Visual Studio Code, because the syntax is correct. I had to break the thing down into individual steps and manually run it to find the problem.

I'm going to try this out in RStudio and see if that fixes it.

I really appreciate your time, PeroK.
 
  • Like
Likes   Reactions: PeroK
  • #12
PeroK said:
What does that mean in this context?
Usually it is someone who is not given the treatment but who's outcome of interest is observed. There is (always, I think) a form of " blinding" in that either the subject , the experimenter or both does not know ahead of time which subject is being controlled. Maybe the clearest example is someone given a medical treatment to test whether it has some form of effect, say weight loss. Then some subject will not be given the treatment and it will be observed whether they lost weight. This helps determine a psychological/placebo effect , the psychological component of the effect.
 
  • #13
ckirmser, areyou familiar with conditional probabilities? I think you can frame this question in those terms.
 

Similar threads

  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 105 ·
4
Replies
105
Views
15K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 5 ·
Replies
5
Views
5K
  • · Replies 7 ·
Replies
7
Views
4K
  • · Replies 9 ·
Replies
9
Views
4K