# Effectiveness of a Boolean test

1. Sep 20, 2013

### geo101

I’m working with a Boolean parameter that has been proposed to identify good data and I want to test if it is effective on a data set where we know what is good (accurate) and bad (inaccurate results).

The theory (sensu amplo!) behind the Boolean parameter, called “SC”, is that a value of “1” should indicate accurate data and a value of “0” should indicate inaccurate data. I have a data set of 13,174 results, where we independently know that 5400 are accurate and 7774 are inaccurate.

Of the accurate results, 4016 (~74%) have an SC value of 1. Of the inaccurate results, 4373 (~56%) have an SC value of 1. What I would like to do is apply a statistical test to assess the significance of the difference between these two proportions and determine if, at some significance level, SC is effective at identifying accurate results.

Any thoughts and advice would be most welcome

2. Sep 20, 2013

### Office_Shredder

Staff Emeritus
I would be willing to eat a sock if they weren't different statistically - that should count as some sort of significance test rigt there.

More seriously you want to do this T-test:
http://en.wikipedia.org/wiki/Welch's_t_test

3. Sep 23, 2013

### geo101

How would the t-test be adapted to this situation? Given that the SC parameter is essentially a yes/no result, would a test based on a binomial distribution not be more appropriate?

While that certainly would be significant, I would have to see evidence of the sock devouring :tongue:

4. Oct 3, 2013

### geo101

$p_r=1- \sum\limits_{i=0}^{N_{s}}{ N_{t} \choose i } P^{i}(1-P)^{N_{t}-i}$