Hypothesis Testing with Bayes

FaraDazed · Nov 4, 2018

My example question is as below.

"You're at a local computer fair looking for some nice sleeved cables or your new system, a stall operator shows you his cables claiming they are the branded version. But the stall operator is not the usual person running the stall and actually does not know if they're the branded ones or not.

You measure the cable precisely with callipers and find the cable to have diameter of 13mm. You also know from research that the non-branded cables come in two forms, thin and thick, which have mean diameters of 31mm and 11mm, both with a standard deviation of 2mm. You also know that the branded ones have mean diameter of 23mm with a standard deviation of 7.5mm. Both the branded and non-branded sizes are normally distributed. In this part of the country you know 30% of these types of cables are the branded version and 70% are non-branded. The thin and thick variety of non-branded cables are just as popular as one another with the public.

What are the odds that the cables are the branded version? You will need to use the Bayesian approach to hypothesis testing."

The question is wordy and hard to wrap your head around but I completely understand what it is asking, I just have no idea how to go about it, or where to start. I have spent hours scratching my head on this one!

We have been taught two methods of Bayesian hypothesis testing, estimation (one prior) and model comparison (two priors).

My first though was to use model comparison, as that results in the odds of one model over the other, where the one model is that the hypothesis is that they are branded and the second model is the hypothesis that they are non-branded. I don't know if that is the correct approach or if it is, what to do, the fact there are two varieties of non-branded ones confuses the hell out of me too.

Any help appreciated.

Dale · Nov 4, 2018

FaraDazed said:

I don't know if that is the correct approach or if it is, what to do, the fact there are two varieties of non-branded ones confuses the hell out of me too.

Just treat it as three hypotheses. A: the cable is branded, B: the cable is thick unbranded, and C: the cable is thin unbranded. You already have a prior probability for each hypothesis, and you can calculate P(data|hypothesis) for each, so you can calculate the posterior also.

FaraDazed · Nov 4, 2018

Dale said:

Just treat it as three hypotheses. A: the cable is branded, B: the cable is thick unbranded, and C: the cable is thin unbranded. You already have a prior probability for each hypothesis, and you can calculate P(data|hypothesis) for each, so you can calculate the posterior also.

Thanks for the quick reply, ah I didn't think of splitting it into three. The prior probability for each hypothesis is where I am/were getting confused between knowing the sizes are normally distributed i.e. ##N(\mu=23mm \ \ \sigma=0.75mm)## for the non-branded, and also knowing that 30% in general are non-branded, i.e. where to use what information, or do I need to use both bits of information to construct the prior.

Dale · Nov 4, 2018

FaraDazed said:

The prior probability for each hypothesis is where I am/were getting confused

Did you maybe miss this subtle statement “The thin and thick variety of non-branded cables are just as popular as one another with the public.” Together with the 70% number, this allows you to construct a prior for the thick and the thin hypotheses

FaraDazed · Nov 4, 2018

Dale said:

Did you maybe miss this subtle statement “The thin and thick variety of non-branded cables are just as popular as one another with the public.” Together with the 70% number, this allows you to construct a prior for the thick and the thin hypotheses

Ok yeah, so 30% branded, 35% thick non-branded, 35% thin non-branded, I get that, but where do the normal distributions of the sizes for the branded, thick and thin non-branded come into play? In terms of the "data" is the only data the data we got when we measured the cable ourselves and got 13mm?

I'm sorry I'm so confused, most of the research I do in the topic also is incomprehensible to me. Once I have one problem under my belt I can then usually reproduce it for similar problems, but doing it for the first time!

Dale · Nov 4, 2018

FaraDazed said:

where do the normal distributions of the sizes for the branded, thick and thin non-branded come into play?

Those are used to calculate the likelihoods, ##P( data|hypothesis)##, for each hypothesis.

FaraDazed said:

In terms of the "data" is the only data the data we got when we measured the cable ourselves and got 13mm?

Yes

FaraDazed · Nov 4, 2018

Dale said:

Those are used to calculate the likelihoods, ##P( data|hypothesis)##, for each hypothesis.

Yes

Right! Ok I think I got it now. Thank you so much for your help, I was getting confused thinking the normal distributions were needed for the prior too.

Dale · Nov 4, 2018

You are welcome! After a couple of these problems I am sure you will get it. It’s a new way of thinking, but it does make sense

FaraDazed · Nov 5, 2018

Dale said:

You are welcome! After a couple of these problems I am sure you will get it. It’s a new way of thinking, but it does make sense

Thanks again, I have done the problem now and have found that the odds that the cables are branded is roughly 1/3. I.e. it is almost three times more likely to be unbranded.

The posterior for the thick non-branded was so small as expected given the data, so this will make practically no difference to the result, but I did want to double check that it is mathematically correct to say that the odds the cables are branded is equal to[tex]
\frac{P(\textrm{branded hypothesis}|\textrm{data})}{P(\textrm{thin hypothesis }|\textrm{data}) + P(\textrm{thick hypothesis }|\textrm{data})}
[/tex]

As the non branded is comprised of two components, is the above correct?

Also I wanted to check, given my research on the topic when I first started the problem I was expecting to get a distribution for the posterior, rather than a number, is this just because my prior for each case was just a number and not a distribution?

FaraDazed · Nov 8, 2018

Dale said:

You are welcome! After a couple of these problems I am sure you will get it. It’s a new way of thinking, but it does make sense

Hi, sorry to bug you again, just needed to know if my understanding of the result is correct, if the equation in my post above is mathematical y correct for the odds of branded to non-branded.

I know if it were just a case of the odds of branded to thin non-branded then it is just

[tex]
\frac{P(\textrm{branded hypothesis}|\textrm{data})}{P(\textrm{thin hypothesis }|\textrm{data})}
[/tex]

But since the non-branded comes in both thin and thick varieties, is the equation in my previous post correct?

Sorry to bug you!

Dale · Nov 8, 2018

FaraDazed said:

But since the non-branded comes in both thin and thick varieties, is the equation in my previous post correct?

Sorry to bug you!

No problem, sorry I missed the above post. Yes, that equation is correct. It looks like you have a correct understanding

FaraDazed · Nov 8, 2018

Dale said:

No problem, sorry I missed the above post. Yes, that equation is correct. It looks like you have a correct understanding

Ok thank you!

Hypothesis Testing with Bayes

1. What is Bayes' theorem and how is it used in hypothesis testing?

2. How does Bayes' theorem differ from traditional frequentist methods of hypothesis testing?

3. What are the key assumptions of Bayesian hypothesis testing?

4. How do you choose a prior distribution in Bayesian hypothesis testing?

5. Can Bayesian hypothesis testing be applied to all types of data?

Similar threads

Hot Threads

Recent Insights