Undergrad AI Detection - Phase 2: finding hypotheses

  • Thread starter Thread starter fresh_42
  • Start date Start date
Click For Summary
SUMMARY

This discussion focuses on the discrepancies observed between AI detection tools, specifically ZeroGPT and OpenAI, in identifying human versus AI-generated text. The participants analyze a case where a text was classified as 100% AI by ZeroGPT and only 0.02% AI by OpenAI, highlighting the influence of text length and specific content on detection accuracy. The conversation also delves into the implications of these findings for users seeking reliable AI detection methods, emphasizing the need for understanding the characteristics of input text that affect detection outcomes.

PREREQUISITES
  • Understanding of AI text generation and detection mechanisms
  • Familiarity with ZeroGPT and OpenAI detection tools
  • Knowledge of text characteristics affecting AI detection (e.g., length, complexity)
  • Basic principles of Newton's laws and power dynamics in physics
NEXT STEPS
  • Research the algorithms used by ZeroGPT and OpenAI for AI detection
  • Study the impact of text length on AI detection accuracy
  • Explore the role of linguistic features in distinguishing human and AI-generated text
  • Investigate best practices for preparing text for AI detection tools
USEFUL FOR

This discussion is beneficial for AI researchers, content creators, educators, and anyone involved in the development or use of AI detection tools, particularly those seeking to understand the nuances of text classification accuracy.

fresh_42
Staff Emeritus
Science Advisor
Homework Helper
Insights Author
2024 Award
Messages
20,815
Reaction score
28,438
Let us gather possible hypotheses here which we can test our data against. E.g.

jack action said:
From the intro of an Insight I wrote:

CC: 621 (100% human / 0% AI)
A: 100% AI (104 words)
B: 0.04% AI (113 tokens)

If this wasn't a typo, then it is the first indication of what I'm after: a very significant gap between A and B. How could that have happened?
@jack action can you quote the text?
 
Physics news on Phys.org
fresh_42 said:
@jack action can you quote the text?

If a moving vehicle has an energy source that has a variable power output, the energy source must be set to its maximum power – during the entire velocity range – to ensure that the vehicle will get its maximum possible acceleration throughout that velocity range. At any given velocity: The force applied to the vehicle dictates the acceleration it gets; The power applied to the vehicle dictates the force it gets; Therefore, the maximum possible acceleration of the vehicle depends solely on the maximum power available for the vehicle. When it comes to accelerating a moving vehicle, only power tells the whole story.

To be fair - I did not notice earlier - zerogpt suggests **Please input more text for a more accurate result*. Still, the whole text is considered most/likely generated by AI.

I did not put the rest of the text as there were a lot of equations and variable names and I thought it would skew the result. Anyway, I tried the whole text just now:
Statement If a moving vehicle has an energy source that has a variable power output, the energy source must be set to its maximum power – during the entire velocity range – to ensure that the vehicle will get its maximum possible acceleration throughout that velocity range. At any given velocity: The force applied to the vehicle dictates the acceleration it gets; The power applied to the vehicle dictates the force it gets; Therefore, the maximum possible acceleration of the vehicle depends solely on the maximum power available for the vehicle. When it comes to accelerating a moving vehicle, only power tells the whole story. Explanations Force requirement The first basic requirement is given by Newton’s second law: The force F required is equal to the mass m of the vehicle times the desired acceleration a of the vehicle. In simple terms: F = ma. Power requirement But since there is a force in motion, work is done, so there is a second requirement: The power P required is equal to the force F applied to the vehicle times the velocity v of the vehicle. In simple terms: P = Fv. Putting it all together If the two equations are combined together, we get P = mav. This means that as long as there are a mass m and velocity v (i.e. not equal to zero), the power P required is proportional to the desired acceleration a. At this point, we can ignore Newton’s second law because it is indirectly implied in this new equation, i.e if the power requirement is fulfilled, the force requirement is also necessarily fulfilled. We have been talking about “desired acceleration” and “required power” until now but, in the real world, we are often given a power rating from an energy source and we take whatever acceleration we can get from it. In this case, the equation can be rewritten as a = P/(mv). With this new equation, assuming power and mass are constants, we can see that the acceleration is a function of velocity. Particularly, as the velocity increases, the acceleration will decrease. Since the mass m is a constraint given by the initial problem, it cannot be modified. The velocity v is also a constraint given by the initial problem, that is, it must be within the desired velocity range. So if one wants to increase the acceleration throughout the velocity range, one has no other choice but to increase the power available to the vehicle. If the power P is doubled, the acceleration a throughout the velocity range will also be doubled (remembering that the acceleration will still decrease as the velocity increases). Power is power Because of the law of conservation of energy, the power available to the vehicle is equal to the power given by the energy source powering the vehicle (not considering losses). The energy source can make its power with: a rotational system (P = torque times angular velocity); fluid power (P = pressure times volumetric flow rate); electricity (P = potential difference times current); combustion (P = fuel mass flow rate times fuel heating value); or any other way one can think of, it does not matter. Although, in any case, note that there may be some inefficiencies that will lead to some losses due to transformations between the energy source and the point of application on the vehicle. Obviously, only the power available at the point of application on the vehicle is relevant. A common mistake When considering the special case where a vehicle is powered by wheels of radius r, some people like to state they can link the acceleration directly to the wheel torque T, by using the relation F = T/r instead of the power equation we used. Combining this equation with Newton’s second law, they get T/r = ma and claim that it is a more direct way because the wheel radius r is constant (unlike the velocity v). But where does that radius comes from? Are we allowed to choose any value? The equation F = T/r is subjected to the law of conservation of energy which extends to power, namely, Pin = Pout. With a rotating object, Pin = Tω (where ω is the object angular velocity) and Pout = Fv. This means that T/F = v/ω. So if T/F = r, then v/ω = r as well. The radius r implies a transformation where power is kept constant, and that cannot be ignored. Replacing r with the velocity ratio in the misleading equation will give Tω/v = ma. Thus we get back to our original equation: P = mav. The introduction of the wheel radius does not simplify the process, it just hides the important notion of conservation of energy. Even with this special case [1], there are no ways around it, one way or another, power will have to be considered because velocity must be considered when accelerating a moving vehicle.
And now it says that it is most likely human-written with 12.77% AI. The same portion of the text is the suspected AI-generated text.

For its part, openai dropped to 0.02% AI, based on the first 510 tokens among the total 980.
 
That matches my experiences and is evidence for my hypothesis assuming that "....; The ..." is a technical mistake. Or is it allowed in English to continue with a capital letter after a semicolon? It would be a mistake in German, but I don't know such subtleties in English.

So, the length of a sample might make a difference and ZeroGPT needs lengthier samples than OpenAI.
 
fresh_42 said:
So, the length of a sample might make a difference and ZeroGPT needs lengthier samples than OpenAI.
Even with the extra text, ZeroGPT assumes the initial text is AI-generated.
fresh_42 said:
Or is it allowed in English to continue with a capital letter after a semicolon?
No, it's not. But I don't like it and I often write a capital letter. I try to correct myself but it is still an ongoing inner battle for me every time. :smile: I'm even worse with a colon (where a capital letter shouldn't also be required in most cases, but may be allowed sometimes).
 
fresh_42 said:
How could that have happened?
If the point of the thread is to answer this question, perhaps the most likely explanation is that the Insight, now that it is on the web, is part of the training set for one or more of these programs.

If the point is that there is a weak correlation between A and B, that can be explained if they are looking at different things. If trying to identify whether a vehicle is a fire engine, one might ask "is it big?" and the other "is it red?". Both will agree a small black sports car is not a fore engine.

One could always produce a better estimator by usingh the output of A and B as inputs to either a classical statistical combination or something more complex and ML-like.
 
Vanadium 50 said:
If the point of the thread is to answer ...
  1. Which one of the two is more trustworthy in general and for which kind of texts?
  2. Which circumstances trigger false answers?
    By now, it looks as if a couple of errors trigger the call "human" on both and independent of the real source.
  3. We have had examples where one said ~100% human and the other one ~100% fake.
    What are the reasons for such discrepancies?
    And I do not mean the underlying model which we cannot know much about.
    I mean, which text has to go to which engine in order to receive a trustworthy result?
The goal is not to analyze the detection machines in terms of their construction plans or training levels. The goal - as I see it - is to decide when to ask one bot and when to ask the other one, or ideally: always ask bot xy. I am a user and I want to know which one I should use; possibly depending on the characteristics of the input text (length, errors, commata, formulas, links, etc.)
 
If there are an infinite number of natural numbers, and an infinite number of fractions in between any two natural numbers, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and an infinite number of fractions in between any two of those fractions, and... then that must mean that there are not only infinite infinities, but an infinite number of those infinities. and an infinite number of those...

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
10
Views
5K
  • · Replies 100 ·
4
Replies
100
Views
10K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 8 ·
Replies
8
Views
5K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 7 ·
Replies
7
Views
4K