How Reliable Is Your Anti-Spam Software's Error Detection?

moonman239 · Jun 13, 2010

Let's say I'm testing anti-spam software. The number of false positives (aka, friendly messages misidentified as spam, for those who don't know the term) is 40. The number of false negatives (spam messages misidentified as friendly) is also 40. I'm testing 100 messages. How many more messages would I need to test in order to be 99.99% that the null hypothesis can/cannot be rejected?

EnumaElish · Jun 13, 2010

http://en.wikipedia.org/wiki/Sample_size#Required_sample_sizes_for_hypothesis_tests

moonman239 · Jun 13, 2010

Thanks for the link, but I'm not seeing any equation whatsoever that will help. As far as I can see, all the listed equations have to do with means.

EnumaElish · Jun 13, 2010

Oh, so what's your null hypothesis? I thought you were to test that the average message is not spam.

moonman239 · Jun 14, 2010

I just don't see how a sample mean would be relevant. Your understanding of the question is correct.

D H · Jun 14, 2010

Be very specific, please: What is your null hypothesis?

Also, you have told us about 80 messages, what about the other 20?

moonman239 · Jun 14, 2010

D H said:

Be very specific, please: What is your null hypothesis?

Also, you have told us about 80 messages, what about the other 20?

Let's say my null hypothesis is that a spam message will be correctly marked as spam. As for the 20 messages, let's say that those are false negatives.

D H · Jun 14, 2010

You already said you had 40 false positives and 40 false negatives out of 100 tests. That makes for a total of 80 out of 100. Those remaining 20 are either true positives or true negatives.

What does the confusion matrix for your test results look like?

How Reliable Is Your Anti-Spam Software's Error Detection?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight