Is it possible to quantify human language?

  • Context: Undergrad 
  • Thread starter Thread starter EighthGrader
  • Start date Start date
  • Tags Tags
    Human Language
Click For Summary

Discussion Overview

The discussion revolves around the possibility of quantifying human language, particularly in the context of analyzing writing styles to determine authorship. It explores the use of computational methods and historical approaches to stylometry, as well as the implications of such analyses.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification

Main Points Raised

  • One participant wonders if it is possible to quantify human language to compare writing styles and determine authorship based on unique word choices and sentence structures.
  • Another participant notes that analysis of writing styles has been conducted for a long time, with historical examples predating computer use, such as the analysis of the five books of Moses.
  • A third participant introduces the term "stylometry" and mentions a freeware program called Signature that can be used for authorship attribution, suggesting it includes examples like the Federalist Papers for practice.
  • A later reply encourages reading about computational linguistics to gain a broader understanding of the topic before exploring specific tools or techniques.

Areas of Agreement / Disagreement

Participants express varying levels of familiarity with the topic, but there is no explicit consensus on the effectiveness or reliability of quantifying language through stylometry. Multiple viewpoints on the historical context and current tools remain present.

Contextual Notes

The discussion does not resolve the effectiveness of stylometry or the specific methodologies involved, leaving open questions about the assumptions and definitions related to quantifying language.

Who May Find This Useful

Individuals interested in linguistics, authorship attribution, computational linguistics, or the intersection of technology and literary analysis may find this discussion relevant.

EighthGrader
Messages
11
Reaction score
0
While listening on my english lecture, my teacher told us that authors usually have similar word choices and sentences structures on their writings.

Because of that, I began to wonder if it is possible (if it hasn't been done already) to quantify the human language (if that's the right term for it). The reason I ask is maybe it is possible to compare two or more pieces of writing quantitatively through a computer and see if it is from the same author based from its (unique) word and sentence structures?
 
Mathematics news on Phys.org
This kind of analysis has been going on for a long time. The use of computers speeds up the process, but the concept long predates their use. One important example is breaking down the five books of Moses (19th century) into four major sources.
 
What you're talking about is called "stylometry." I don't vouch for it because I haven't used it, but if you have the inclination, you can goof around with a freeware program called Signature that employs these techniques to determine authorship. It comes packaged with a copy of the Federalist Papers to help you learn how to use it. This paper gives a general overview of the history and status of authorship attribution.
 

Similar threads

  • · Replies 3 ·
Replies
3
Views
4K
Replies
98
Views
4K
  • · Replies 12 ·
Replies
12
Views
5K
  • Sticky
  • · Replies 0 ·
Replies
0
Views
5K
Replies
10
Views
5K
  • · Replies 64 ·
3
Replies
64
Views
4K
  • · Replies 23 ·
Replies
23
Views
5K
  • · Replies 6 ·
Replies
6
Views
2K
  • · Replies 4 ·
Replies
4
Views
3K