# Survey with dynamic question selection

1. Mar 23, 2012

### GiTS

I am designing a survey to gather data on consumer preferences for notebooks (school club charity project). The survey will contain only 10 questions about what notebook designs they prefer. Thus, I only want them to see notebooks they are likely to want.

I am not sure how to set up the equation. I want to find the probability a person will like a notebook based on their responses and the responses of others. Sort of like how amazon has “people who bought this product also liked”.

So if I have 40 different designs of notebooks, I want to give each design a chance to appear on the survey, but weighted by the probability the person will like it.

The notebooks the very first respondent sees are completely random. But the next respondent sees what they are likely to like, based on the first person.

I am trying to simplify the problem to make it easier for me. Alice and Bob take surveys. Alice goes first.

Q1: Do you like Tacos? Y/N
Q2: Do you like Pizza? Y/N
Q3: Do you like Spaghetti? Y/N

What’s the probability Bob will give a specific answer for:
Q1, Q2, Q3

I figure if I can find the basic equation I can turn it into 40 different questions and hundreds of respondents.

Any thoughts?

2. Mar 24, 2012

### GiTS

OK, after some memory jogging research I think I need to use correlation regression. Now, i just have to remember/relearn the equation for it and how to solve.

I can problem find Ʃ with some for of counting loop like
$n = number of respondents for$n (\$x1 +...

3. Mar 27, 2012

### GiTS

Ok, so I've simplified the problem down and changed some things to make it easier to think about.

Three possible questions:
Do you like Tacos? y/n
Do you like Burritos? y/n
Do you like Soup? y/n

After 100 completed surveys the results are
a People who liked tacos, burritos and soup: 30%
b People who liked tacos, burritos only: 35%
c People who liked tacos, soup only: 4%
d People who liked burritos and soup only: 3%
e People who liked tacos only: 7%
f People who liked burritos only: 6%
g People who liked soup only: 15%

The 101st respondent is asked "do you like tacos?"
The odds they will like tacos are 7%+4%+30%+35%=76%
They respond "yes"
Then they are asked : Do you like burritos?
The odds must be adjusted because now there are less possibilities.d, f and g must be removed.
So
a People who liked tacos, burritos and soup: 30%
b People who liked tacos, burritos only: 35%
c People who liked tacos, soup only: 4%
e People who liked tacos only: 7%

it only adds up to 76%. If we adjust the percentages above to total 100. So of the people that liked tacos...
a People who liked tacos, burritos and soup: 39.5%
b People who liked tacos, burritos only: 46.1%
c People who liked tacos, soup only: 5.1%
e People who liked tacos only: 9.3%

So of the respondents who liked tacos but not burritos...
c People who liked tacos, soup only: 35.4%
e People who liked tacos only: 64.6%

Thus, there is a higher likely hood that the respondent will not like soup.

Now I just have to think about how to apply this better. For some reason, maybe my business stats class, I think I need least squares and a regression analysis...

4. Apr 11, 2012

### GiTS

After consulting my a statistics professor, I will be using ANOVA or regression analysis. I may not use php as it is a pain to host.