AI vs. foreign language

Click For Summary

Discussion Overview

The discussion revolves around the recognition of text generated by AI versus that produced by non-native speakers of a language. Participants explore the characteristics of language use, the implications of AI-generated text, and the challenges in distinguishing between human and AI writing styles.

Discussion Character

  • Debate/contested
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • Some participants suggest that accusations of AI-generated text often stem from misunderstandings of native vernacular versus foreign language use.
  • One participant claims that recognizing AI-generated text is fundamentally about assessing probabilities, potentially using AI tools for this purpose.
  • Another participant mentions that some AI generators embed watermarks in the text, which can be explicit, invisible, or contained in metadata.
  • There is a claim that large language models are adept at producing text that can be distinguished from that of ESL writers, contradicting an earlier assertion about false positives.
  • One participant argues that the detection of AI-generated text may rely on the predictability of word choices rather than their novelty, suggesting that predictable patterns indicate AI authorship.
  • A comparison is drawn between AI text detection and methods used to identify cheating in chess, where moves matching those of a chess engine raise suspicion.

Areas of Agreement / Disagreement

Participants express differing views on the characteristics that distinguish AI-generated text from that written by non-native speakers. There is no consensus on the most effective indicators for detection, and multiple competing perspectives remain unresolved.

Contextual Notes

Participants highlight the complexity of language use and the influence of various factors, such as the background of the writer and the nature of AI text generation, which complicate the detection process.

snorkack
Messages
2,388
Reaction score
536
I sometimes see accusations that something is described as "AI" written, citing grammar or vocabulary use untypical of "natural language" - but the examples actually more specifically apply to "native vernacular".

A lot of people write language which is NOT their native vernacular. Learning a foreign language, especially based on grammar works and dictionaries, and from teachers who themselves are not native speakers (and neither are authors of the language textbooks) does not necessarily produce the same grammar or vocabulary choices as a native speaker would make. An effort to avoid ambiguities or make points will also affect both vocabulary and grammar.

How do you recognize style which only AI could write and which no foreign language could plausibly write?
 
Computer science news on Phys.org
snorkack said:
How do you recognize style which only AI could write and which no foreign language could plausibly write?
You can't, all you can do is assess probabilities. And the best way to assess the probability is ... to use an AI.

See https://www.grammarly.com/ai-detector.
 
  • Like
Likes   Reactions: russ_watters
Some AI generators apply a kind of watermark embedded with the generated text. However, I don't think they make it public knowledge of what they do exactly.

There are three types:
- explicit with an authorship notation
- invisible via word selection or hidden characters ie zero width characters in the text visible when using a Unicode detector
- using metadata in a generated file

The word selection one is the one they don't talk about much. This is where probabilities would be needed to estimate. I imagine an ESL writer would more likely score a false positive here.
 
  • Informative
Likes   Reactions: berkeman
jedishrfu said:
I imagine an ESL writer would more likely score a false positive here.
Actually the opposite - large language models (LLMs) which is what we are referring to here as AIs are so good at generating text in a particular language that clearly distinguishes it from ESL-authored text.
 
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were. ESL writers would likely not choose words a native speaker would choose.

Similarly for an LLM, when I as a native speaker write a sentence and let Grammarly improve it, the word choices are richer. It allows me to decide which version I like better. Before this was known as a grade level score but now it could well be a AI content score. Hemingway editor had a feature to score your writing as being at a certain grade level.

https://hemingwayapp.com/
 
jedishrfu said:
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were.
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
 
  • Like
Likes   Reactions: jedishrfu
pbuk said:
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
This is how suspected cheaters in chess are measured. It is very suspicious if their moves perfectly match the moves of a chess engine like Stockfish. There will probably be similar things developed for other AI areas.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
Replies
10
Views
5K
  • · Replies 5 ·
Replies
5
Views
2K
Replies
38
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 17 ·
Replies
17
Views
4K
  • · Replies 2 ·
Replies
2
Views
5K
Replies
3
Views
3K
  • · Replies 6 ·
Replies
6
Views
2K
Replies
12
Views
5K