Struggling with Bayesian Truth Serum formula

  • Context: Graduate 
  • Thread starter Thread starter Illmaticus
  • Start date Start date
  • Tags Tags
    Bayesian Formula
Click For Summary
SUMMARY

The discussion centers on the calculation of scores using the Bayesian Truth Serum (BTS) formula, which incentivizes truthful reporting in subjective data scenarios. The participants are required to provide both their personal answer and a prediction of the sample distribution. The formula comprises an information score and a prediction score, with specific calculations provided for a survey on favorite colors, red and blue. The user struggles with discrepancies in their calculated information scores compared to the expected values, highlighting the need for accurate arithmetic in applying the BTS formula.

PREREQUISITES
  • Understanding of Bayesian Truth Serum methodology
  • Familiarity with logarithmic functions in statistical calculations
  • Basic knowledge of survey design and data collection
  • Ability to interpret and manipulate sample distributions
NEXT STEPS
  • Review the original paper on Bayesian Truth Serum by Dražen Prelec
  • Practice calculating information scores using different datasets
  • Explore advanced statistical methods for eliciting truthful responses
  • Learn about the implications of Bayesian Truth Serum in survey research
USEFUL FOR

Data scientists, statisticians, survey researchers, and anyone interested in improving the accuracy of subjective data collection methods.

Illmaticus
Messages
1
Reaction score
0
Hi there,

right now, I am struggling to successfully calculate scores with the Bayesian Truth formula and I hope this is the right place to find someone who can help me along
For everyone, who doesn’t know it, I will summarize it briefly : The Bayesian Truth Serum is an scoring method, that provides truth-telling incentives and elicits “truthful subjective data in situations where the objective truth is unknowable”. The basic idea is, that truthful answers maximize ones individual score.

When using this method, people are asked for dual reports:
  • First they are asked to endorse their personal answer to an m multiple-choice question. E.g. Which color do you like most - Red or Blue?
  • Second, they are asked to predict what the other individuals in the sample might answer: Form a prediction of the sample distribution of all endorsments: Red __% ; Blue __%

  • The respondents are defined as r ∈ {1,2,…}
  • xrk indicates whether respondent r has endorsed answer k ∈ {1,…,m} and xrk ∈ {0,1} meaning that xrk=0 for all answers except the one endorsed by r.
  • yr = (yr1,…yrm) indicates the prediction of the sample distribution) and yrm>0 and \sum\limits_{k}^{} y^{r}_{k}. So in this example it could be (0,6;0,4) for someone who thinks that most people like red.

The formula consists of an information score + prediction score.

  • The information score is given as \sum\limits_{k}^{} x^{r}_k \cdot log \frac{x^{-}_k }{y^{-}_k} . x^{-}_k is the fraction endorsing answer k and y^{-}_k is the geometric average of endorsment prediction yr = (yr-,…yrm).
  • The prediction score is given as a\cdot \sum\limits_{k}^{} x^{-}_{k} \cdot log \frac{y^{r}_k }{x^{-}_k }

Back again to the question about the favorite Color red or blue: I have created a table for a fictional Survey in order to practice the calculation of those BTS-scores.

HTML:
<table border="1" style="background-color:#FFFFCC;border-collapse:collapse;border:1px solid #FFCC00;color:#000000;width:100%" cellpadding="3" cellspacing="3">
    <tr>
        <td>Individual Answerl</td>
        <td></td>
        <td></td>
        <td></td>
        <td>Predicted fraction</td>
        <td></td>
    </tr>
    <tr>
        <td>r</td>
        <td>red</td>
        <td>blue</td>
        <td>l</td>
        <td>redl</td>
        <td>blue</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>1l</td>
        <td>1</td>
        <td>0</td>
        <td></td>
        <td>0,8</td>
        <td>0,2</td>
    </tr>
    <tr>
        <td>7</td>
        <td>0</td>
        <td>1</td>
        <td></td>
        <td>0,6</td>
        <td>0,4</td>
    </tr>
    <tr>
        <td>7</td>
        <td>0</td>
        <td>1</td>
        <td></td>
        <td>0,6</td>
        <td>0,4</td>
    </tr>
    <tr>
        <td>7</td>
        <td>0</td>
        <td>1</td>
        <td></td>
        <td>0,6</td>
        <td>0,4</td>
    </tr>
    <tr>
        <td>average</td>
        <td>0,7</td>
        <td>0,3</td>
        <td>geometric average</td>
        <td>0,73385</td>
        <td>0,24623</td>
    </tr>
</table>
I also have the Information scores for both answers red and blue given: information score for red should be -0,04722677. The info-score for blue should be 0,24622888.

Now that's where i am struggling. When I calculate it myself, i am getting Information score (red)= -0,0205103 (so for people stating that their favorite color is red) and Information score (blue) = 0,0857823, which is both not correct.
should be -0,04722677. The info-score for blue should be 0,24622888.

I calculated it using the formula 1*log\frac{0,7}{0,73385} + 0*log\frac{0,3}{0,24623}

What am I doing wrong?
 
Physics news on Phys.org
Only the html source shows up in my browser. I gather there are 10 respondents. Seven of them pick "red" and all seven estimate the population distribution to be 0.8 for red and 0.2 for blue. Three of the respondents pick "blue" and all three estimate the population distribution to be 0.6 for red and 0.4 for blue.

The original paper about "Bayesian Truth Serum" appears to be http://www.google.com/url?sa=t&rct=...hnoy5yfj1AcSDhg&bvm=bv.93756505,d.aWw&cad=rja

Now, we just need some kind person to check you arithmetic.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 62 ·
3
Replies
62
Views
12K
  • · Replies 67 ·
3
Replies
67
Views
16K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 1 ·
Replies
1
Views
4K
  • · Replies 127 ·
5
Replies
127
Views
28K