AI vs. foreign language

Click For Summary
SUMMARY

The discussion centers on distinguishing AI-generated text from that produced by non-native speakers of a language. It highlights that AI, particularly large language models (LLMs), can generate text that is often more sophisticated than that of English as a Second Language (ESL) writers. Tools like Grammarly and Hemingway App are mentioned as resources for assessing text quality and AI content scoring. The conversation also notes that predictability in word choice is a key indicator of AI-generated content, contrasting with the more varied vocabulary typically used by native speakers.

PREREQUISITES
  • Understanding of large language models (LLMs)
  • Familiarity with AI detection tools such as Grammarly and Hemingway App
  • Knowledge of ESL writing characteristics
  • Basic concepts of text analysis and scoring metrics
NEXT STEPS
  • Research the functionality of AI detection tools like Grammarly's AI detector
  • Explore the features of Hemingway App for writing assessment
  • Learn about the implications of predictability in AI-generated text
  • Investigate the use of metadata in AI-generated content detection
USEFUL FOR

Writers, educators, linguists, and anyone involved in content creation or analysis who seeks to understand the nuances between AI-generated text and that produced by non-native speakers.

snorkack
Messages
2,388
Reaction score
536
I sometimes see accusations that something is described as "AI" written, citing grammar or vocabulary use untypical of "natural language" - but the examples actually more specifically apply to "native vernacular".

A lot of people write language which is NOT their native vernacular. Learning a foreign language, especially based on grammar works and dictionaries, and from teachers who themselves are not native speakers (and neither are authors of the language textbooks) does not necessarily produce the same grammar or vocabulary choices as a native speaker would make. An effort to avoid ambiguities or make points will also affect both vocabulary and grammar.

How do you recognize style which only AI could write and which no foreign language could plausibly write?
 
Computer science news on Phys.org
snorkack said:
How do you recognize style which only AI could write and which no foreign language could plausibly write?
You can't, all you can do is assess probabilities. And the best way to assess the probability is ... to use an AI.

See https://www.grammarly.com/ai-detector.
 
  • Like
Likes   Reactions: russ_watters
Some AI generators apply a kind of watermark embedded with the generated text. However, I don't think they make it public knowledge of what they do exactly.

There are three types:
- explicit with an authorship notation
- invisible via word selection or hidden characters ie zero width characters in the text visible when using a Unicode detector
- using metadata in a generated file

The word selection one is the one they don't talk about much. This is where probabilities would be needed to estimate. I imagine an ESL writer would more likely score a false positive here.
 
  • Informative
Likes   Reactions: berkeman
jedishrfu said:
I imagine an ESL writer would more likely score a false positive here.
Actually the opposite - large language models (LLMs) which is what we are referring to here as AIs are so good at generating text in a particular language that clearly distinguishes it from ESL-authored text.
 
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were. ESL writers would likely not choose words a native speaker would choose.

Similarly for an LLM, when I as a native speaker write a sentence and let Grammarly improve it, the word choices are richer. It allows me to decide which version I like better. Before this was known as a grade level score but now it could well be a AI content score. Hemingway editor had a feature to score your writing as being at a certain grade level.

https://hemingwayapp.com/
 
jedishrfu said:
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were.
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
 
  • Like
Likes   Reactions: jedishrfu
pbuk said:
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
This is how suspected cheaters in chess are measured. It is very suspicious if their moves perfectly match the moves of a chess engine like Stockfish. There will probably be similar things developed for other AI areas.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
Replies
10
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
38
Views
3K
  • · Replies 6 ·
Replies
6
Views
1K
  • · Replies 17 ·
Replies
17
Views
4K
  • · Replies 2 ·
Replies
2
Views
5K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
12
Views
5K