AI vs. foreign language

AI Thread Summary
Accusations regarding AI-generated text often stem from differences in grammar and vocabulary that do not align with native vernacular. Non-native speakers frequently produce language that diverges from native grammar and vocabulary due to learning methods that emphasize formal structures rather than natural usage. Recognizing text that is uniquely AI-generated versus that produced by non-native speakers is challenging, as it primarily involves assessing probabilities. AI tools, such as Grammarly, can assist in this evaluation by analyzing word choice and patterns.AI-generated text tends to exhibit predictable word choices, making it distinguishable from the more varied vocabulary typically used by native speakers. This predictability serves as a key indicator of AI authorship, contrasting with the expectation that novel word selection would indicate human writing. Similar detection methods are being developed across various fields, including chess, where moves that align too closely with AI engines raise suspicion of cheating.
snorkack
Messages
2,388
Reaction score
536
I sometimes see accusations that something is described as "AI" written, citing grammar or vocabulary use untypical of "natural language" - but the examples actually more specifically apply to "native vernacular".

A lot of people write language which is NOT their native vernacular. Learning a foreign language, especially based on grammar works and dictionaries, and from teachers who themselves are not native speakers (and neither are authors of the language textbooks) does not necessarily produce the same grammar or vocabulary choices as a native speaker would make. An effort to avoid ambiguities or make points will also affect both vocabulary and grammar.

How do you recognize style which only AI could write and which no foreign language could plausibly write?
 
Computer science news on Phys.org
snorkack said:
How do you recognize style which only AI could write and which no foreign language could plausibly write?
You can't, all you can do is assess probabilities. And the best way to assess the probability is ... to use an AI.

See https://www.grammarly.com/ai-detector.
 
  • Like
Likes russ_watters
Some AI generators apply a kind of watermark embedded with the generated text. However, I don't think they make it public knowledge of what they do exactly.

There are three types:
- explicit with an authorship notation
- invisible via word selection or hidden characters ie zero width characters in the text visible when using a Unicode detector
- using metadata in a generated file

The word selection one is the one they don't talk about much. This is where probabilities would be needed to estimate. I imagine an ESL writer would more likely score a false positive here.
 
jedishrfu said:
I imagine an ESL writer would more likely score a false positive here.
Actually the opposite - large language models (LLMs) which is what we are referring to here as AIs are so good at generating text in a particular language that clearly distinguishes it from ESL-authored text.
 
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were. ESL writers would likely not choose words a native speaker would choose.

Similarly for an LLM, when I as a native speaker write a sentence and let Grammarly improve it, the word choices are richer. It allows me to decide which version I like better. Before this was known as a grade level score but now it could well be a AI content score. Hemingway editor had a feature to score your writing as being at a certain grade level.

https://hemingwayapp.com/
 
jedishrfu said:
But the idea is to check how many words were chosen that are out of the ordinary and how novel the choices were.
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
 
pbuk said:
Why do you think that that is how AI-generated text is detected?

In fact it is just the opposite - one key indicator of AI-generated text is how predictable the choice of words is, not how novel: the more predictable, the more likely to be AI (there are also other indicators).
This is how suspected cheaters in chess are measured. It is very suspicious if their moves perfectly match the moves of a chess engine like Stockfish. There will probably be similar things developed for other AI areas.
 
Back
Top