AI Detection - Phase 1: sample collection

  • B
  • Thread starter fresh_42
  • Start date
  • Tags
    Statistics
  • #1
fresh_42
Mentor
Insights Author
2023 Award
18,994
23,995
TL;DR Summary
Two AI detectors give frequently opposite results on the same input. I need your help to test them and collect data for statistical tests.
I know two programs that claim to be able to detect whether a text has been written by a machine or by a human.

A (ZeroGPT): https://www.zerogpt.com/
B (OpenAI): https://openai-openai-detector.hf.space/

Character Count: https://www.lettercount.com/

If you have time and examples, please test them and report the results. OpenAI also lists the number of checked tokens, so you should add this information, and of course, whether your sample was artificial or not.

Thanks, folks. A little experiment in statistics. And I guess that's only the start of such experiments around the world.

I tested two text passages from two of my insight articles and A and B detected 100% human, so no discrepancies here. That cannot be said from all text passages published at PF.
 
Last edited:
Physics news on Phys.org
  • #2
Here is what an optimal test report would look like:

CC: number of characters (source text is human / AI)
A: AI percentage by ZeroGPT
B: AI percentage by GPT-2

CC: 657 (100% human / 0% AI)
A: 24.53% AI
B: 0.11% AI

CC: 1638 (100% human / 0% AI)
A: 0% AI
B: 0.98% AI
 
Last edited:
  • #3
From the intro of an Insight I wrote:

CC: 621 (100% human / 0% AI)
A: 100% AI (104 words)
B: 0.04% AI (113 tokens)

From another Insight of mine:

CC: 3087 (100% human / 0% AI)
A: 6.64% AI (532 words)
B: 0.02% AI (510 tokens from 624)
 
  • Like
Likes fresh_42
  • #4
CC: 279 (96% AI, 4% hunman)
A: 0% AI (47 words)
B: 0.06% AI (60 tokens)

Original text from ChatGPT: said:
As the first raindrops descend from the heavens, tapping lightly on windows and leaves alike, a symphony of nature begins. The earth awakens to the rhythmic dance of droplets, filling the air with the soothing scent of petrichor and the gentle melody of raindrops kissing the ground.

my version in which I placed some mistakes said:
As the first raindrops descend from the heavens tapping lightly on windows and leaves alike a symfony of nature begins. The earth awakens to the rhythmic dance of droplets, and fills the air with the soothing scent of petrichor, the gentle melody of raindrops kissing the ground.

Test result on the original text:

CC: 279 (100% AI, extract from the answer to "How would I start an essay about rain?")
A: 0% AI (47 words)
B: 3.47% AI (60 tokens)
 
Last edited:
  • Informative
Likes jack action
  • #5
  • Like
Likes fresh_42

What is "AI Detection - Phase 1: Sample Collection"?

AI Detection - Phase 1: Sample Collection refers to the initial stage in a process designed to identify and collect data samples that are used for training and testing artificial intelligence systems. This phase is crucial as the quality and variety of the collected data can significantly impact the performance and accuracy of the AI models developed subsequently.

What types of data are collected in Phase 1?

In Phase 1, a variety of data types can be collected depending on the specific requirements of the AI system being developed. This can include images, videos, audio recordings, textual data, or sensor data. The key is to gather a diverse set of data that represents different scenarios and conditions under which the AI system will operate.

How is the data collected in this phase?

Data collection methods can vary widely but typically involve both automated and manual processes. Automated data collection might use web scraping, sensors, or data generation algorithms. Manual collection could involve tasks performed by humans such as labeling images or entering data manually. Ensuring the data is representative and unbiased is a critical consideration during collection.

Why is the sample collection phase important for AI development?

The sample collection phase is critical because it lays the foundational data upon which all AI training, testing, and validation are based. Poor or biased data collection can lead to AI models that perform inadequately or unfairly, which could have serious implications, especially in sensitive applications like healthcare or autonomous driving.

What are the challenges faced during the sample collection phase?

Challenges during this phase can include ensuring the diversity and representativeness of the data, protecting the privacy and security of data sources, and handling large volumes of data efficiently. Additionally, ethical concerns, such as consent and fairness in data collection practices, are increasingly important to address.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
691
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
Replies
10
Views
2K
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
3K
  • Quantum Physics
Replies
19
Views
2K
Replies
0
Views
7K
Replies
1
Views
1K
Replies
12
Views
2K
Replies
4
Views
1K
Back
Top