How Reliable Is Your Anti-Spam Software's Error Detection?

  • Context: Undergrad 
  • Thread starter Thread starter moonman239
  • Start date Start date
  • Tags Tags
    Errors Type
Click For Summary

Discussion Overview

The discussion revolves around the reliability of anti-spam software's error detection, specifically focusing on the statistical evaluation of false positives and false negatives in a sample of messages. Participants are exploring how to determine the necessary sample size to achieve a specific confidence level in hypothesis testing.

Discussion Character

  • Technical explanation, Debate/contested, Mathematical reasoning

Main Points Raised

  • One participant presents a scenario involving 40 false positives and 40 false negatives out of 100 messages and questions how many additional messages are needed to confidently reject the null hypothesis.
  • Another participant provides a link to a Wikipedia page on sample sizes for hypothesis tests but is met with skepticism regarding its applicability to the problem at hand.
  • There is a discussion about the relevance of sample means to the original question, with some participants expressing confusion over the null hypothesis being tested.
  • A participant clarifies their null hypothesis as being that a spam message will be correctly marked as spam and addresses the status of the remaining 20 messages in the sample.
  • Another participant points out the total of 80 messages accounted for and inquires about the confusion matrix related to the test results.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the appropriate null hypothesis or the relevance of sample means to the discussion. There are multiple competing views regarding the interpretation of the data and the statistical methods to be applied.

Contextual Notes

Participants express uncertainty regarding the definitions and implications of the null hypothesis, as well as the handling of the remaining messages in the sample. The discussion remains focused on clarifying these aspects without resolving the underlying statistical questions.

moonman239
Messages
276
Reaction score
0
Let's say I'm testing anti-spam software. The number of false positives (aka, friendly messages misidentified as spam, for those who don't know the term) is 40. The number of false negatives (spam messages misidentified as friendly) is also 40. I'm testing 100 messages. How many more messages would I need to test in order to be 99.99% that the null hypothesis can/cannot be rejected?
 
Last edited:
Physics news on Phys.org
Thanks for the link, but I'm not seeing any equation whatsoever that will help. As far as I can see, all the listed equations have to do with means.
 
Oh, so what's your null hypothesis? I thought you were to test that the average message is not spam.
 
Last edited:
I just don't see how a sample mean would be relevant. Your understanding of the question is correct.
 
Be very specific, please: What is your null hypothesis?

Also, you have told us about 80 messages, what about the other 20?
 
D H said:
Be very specific, please: What is your null hypothesis?

Also, you have told us about 80 messages, what about the other 20?

Let's say my null hypothesis is that a spam message will be correctly marked as spam. As for the 20 messages, let's say that those are false negatives.
 
You already said you had 40 false positives and 40 false negatives out of 100 tests. That makes for a total of 80 out of 100. Those remaining 20 are either true positives or true negatives.

What does the confusion matrix for your test results look like?
 

Similar threads

  • · Replies 43 ·
2
Replies
43
Views
6K
  • · Replies 9 ·
Replies
9
Views
4K
  • · Replies 8 ·
Replies
8
Views
2K
  • · Replies 61 ·
3
Replies
61
Views
5K
  • · Replies 20 ·
Replies
20
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 15 ·
Replies
15
Views
3K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 85 ·
3
Replies
85
Views
10K