Effectiveness of a Boolean test

  • Context: Undergrad 
  • Thread starter Thread starter geo101
  • Start date Start date
  • Tags Tags
    Test
Click For Summary

Discussion Overview

The discussion revolves around the effectiveness of a Boolean parameter, referred to as "SC," in identifying accurate and inaccurate data within a specific dataset. Participants explore statistical methods to assess the significance of the differences in proportions of accurate and inaccurate results based on the SC values.

Discussion Character

  • Exploratory
  • Technical explanation
  • Mathematical reasoning

Main Points Raised

  • One participant presents a dataset of 13,174 results, with known accurate and inaccurate classifications, and proposes to test the effectiveness of the SC parameter.
  • Another participant suggests using Welch's t-test to assess the statistical difference between the proportions of accurate and inaccurate results.
  • A subsequent reply questions the appropriateness of the t-test, proposing that a binomial distribution might be more suitable given the binary nature of the SC parameter.
  • A later post discusses calculating the probability of selecting accurate results using the binomial cumulative distribution function (CDF) and claims to find a significant result at better than the 5% significance level.

Areas of Agreement / Disagreement

Participants express differing views on the appropriate statistical test to use, with some advocating for a t-test while others argue for a binomial approach. The discussion remains unresolved regarding the best method for analysis.

Contextual Notes

There are limitations regarding the assumptions made about the data distribution and the choice of statistical tests, which have not been fully explored or agreed upon by participants.

geo101
I’m working with a Boolean parameter that has been proposed to identify good data and I want to test if it is effective on a data set where we know what is good (accurate) and bad (inaccurate results).

The theory (sensu amplo!) behind the Boolean parameter, called “SC”, is that a value of “1” should indicate accurate data and a value of “0” should indicate inaccurate data. I have a data set of 13,174 results, where we independently know that 5400 are accurate and 7774 are inaccurate.

Of the accurate results, 4016 (~74%) have an SC value of 1. Of the inaccurate results, 4373 (~56%) have an SC value of 1. What I would like to do is apply a statistical test to assess the significance of the difference between these two proportions and determine if, at some significance level, SC is effective at identifying accurate results.

Any thoughts and advice would be most welcome :smile:
 
Physics news on Phys.org
I would be willing to eat a sock if they weren't different statistically - that should count as some sort of significance test rigt there.

More seriously you want to do this T-test:
http://en.wikipedia.org/wiki/Welch's_t_test
 
Thanks for the reply.

How would the t-test be adapted to this situation? Given that the SC parameter is essentially a yes/no result, would a test based on a binomial distribution not be more appropriate?


I would be willing to eat a sock if they weren't different statistically - that should count as some sort of significance test rigt there.
While that certainly would be significant, I would have to see evidence of the sock devouring :-p
 
OK, what about this...

I have 13,174 results, of which 5400 are accurate. The probability of randomly picking an accurate result is P = 0.4099.
Using the parameter SC = 1, I select a subset of 8389 results (i.e., Nt = 8389).
Of these Nt results 4016 are accurate (i.e., Ns = 4016).
Using the binomial CDF, I can calculate the probability that my realized success rate occurred by chance, using:

p_r=1- \sum\limits_{i=0}^{N_{s}}{ N_{t} \choose i } P^{i}(1-P)^{N_{t}-i}

When I crunch the numbers, I get pr ≈ 0. So I can say that, at better than the 5% significance level, selecting results with SC = 1, will increasing the likelihood of selecting accurate results.

Is this correct??
 

Similar threads

  • · Replies 7 ·
Replies
7
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 22 ·
Replies
22
Views
3K
  • · Replies 5 ·
Replies
5
Views
4K
  • · Replies 17 ·
Replies
17
Views
2K
  • · Replies 19 ·
Replies
19
Views
2K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 15 ·
Replies
15
Views
6K
  • · Replies 0 ·
Replies
0
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K