Effectiveness of a Boolean test

geo101 · Sep 20, 2013

I’m working with a Boolean parameter that has been proposed to identify good data and I want to test if it is effective on a data set where we know what is good (accurate) and bad (inaccurate results).

The theory (sensu amplo!) behind the Boolean parameter, called “SC”, is that a value of “1” should indicate accurate data and a value of “0” should indicate inaccurate data. I have a data set of 13,174 results, where we independently know that 5400 are accurate and 7774 are inaccurate.

Of the accurate results, 4016 (~74%) have an SC value of 1. Of the inaccurate results, 4373 (~56%) have an SC value of 1. What I would like to do is apply a statistical test to assess the significance of the difference between these two proportions and determine if, at some significance level, SC is effective at identifying accurate results.

Any thoughts and advice would be most welcome

Office_Shredder · Sep 20, 2013

I would be willing to eat a sock if they weren't different statistically - that should count as some sort of significance test rigt there.

More seriously you want to do this T-test:
http://en.wikipedia.org/wiki/Welch's_t_test

geo101 · Sep 23, 2013

Thanks for the reply.

How would the t-test be adapted to this situation? Given that the SC parameter is essentially a yes/no result, would a test based on a binomial distribution not be more appropriate?

I would be willing to eat a sock if they weren't different statistically - that should count as some sort of significance test rigt there.

While that certainly would be significant, I would have to see evidence of the sock devouring

geo101 · Oct 3, 2013

OK, what about this...

I have 13,174 results, of which 5400 are accurate. The probability of randomly picking an accurate result is P = 0.4099.
Using the parameter SC = 1, I select a subset of 8389 results (i.e., N_t = 8389).
Of these N_t results 4016 are accurate (i.e., N_s = 4016).
Using the binomial CDF, I can calculate the probability that my realized success rate occurred by chance, using:

p_r=1- \sum\limits_{i=0}^{N_{s}}{ N_{t} \choose i } P^{i}(1-P)^{N_{t}-i}

When I crunch the numbers, I get p_r ≈ 0. So I can say that, at better than the 5% significance level, selecting results with SC = 1, will increasing the likelihood of selecting accurate results.

Is this correct??

Effectiveness of a Boolean test

Thread 'Deductive proof in logic formal systems'

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

I Stochastic calculus: Ito's lemma and differentials

I Help me understand skewness in QQ-plots please

I Intransitive implication

Recent Insights

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem