Tough Textual Criticism math statistics question

  • Context: Undergrad 
  • Thread starter Thread starter Eturnal
  • Start date Start date
  • Tags Tags
    Probability
Click For Summary

Discussion Overview

The discussion revolves around a mathematical problem related to textual criticism of ancient Biblical texts, specifically focusing on the statistical relationships between different codices and their agreement with the majority text (MT). Participants explore conditional probabilities and the implications of similarity measures in this context.

Discussion Character

  • Technical explanation
  • Mathematical reasoning
  • Debate/contested

Main Points Raised

  • One participant notes that Codex 01 and Codex 03 agree with the MT about 87% of the time and with each other about 92% of the time, questioning the implications of these statistics when they disagree.
  • Another participant suggests that the problem could be framed using conditional probabilities and emphasizes the need for a well-defined model to analyze the situation.
  • Some participants express uncertainty about the assumptions being made and the nature of the calculations, indicating that the similarity measure may not function as a probability.
  • There is a discussion about the need to account for the number of options available for each text, which could affect the underlying probabilities of agreement and disagreement.
  • One participant raises the idea of modeling the texts as binary strings or with multiple options per statement, which could influence the statistical analysis.
  • Several participants challenge the notion that the percentages provided can yield robust conclusions without a proper model for data generation.

Areas of Agreement / Disagreement

Participants do not reach a consensus on the correct approach to the problem. There are multiple competing views regarding the interpretation of the statistics and the assumptions necessary for the calculations.

Contextual Notes

Participants highlight the importance of defining the problem clearly, noting that uncertainties and probabilities must be well understood to draw meaningful conclusions. The discussion reflects a mix of exploratory reasoning and technical challenges in applying statistical methods to the humanities.

Eturnal
Messages
9
Reaction score
1
TL;DR
If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)



What are the odds that when Text A and Text B disagree one of them will agree with Text Z?
Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!
 
Last edited:
  • Haha
Likes   Reactions: Agent Smith
Physics news on Phys.org
If this is a homework-type problem then there is a specific format for that. You have to show some work and then we can give hints and guidance. Can you express each problem statement in terms of conditional probabilities or other formal mathematical expressions?
 
Eturnal said:
TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!
I can't make complete sense of what you are assuming and what you want to calculate based on those assumptions. There is also, perhaps, a missing factor about how many options are there for a text, which deteremines the underlying probability that two random texts will agree on something? Finally, you are more in the realm of statistical inference (hypothesis testing) here than in simple probabilities. More detail on this follows:

It seems you could model the situation by considering each text to be a list of things - possibly statements in this case. Perhaps each text makes one hundred statements about some things. If these are true/false statements, then each text is modelled by a binary string one hundred characters long. But, if these statements have more options than just true/false or A/B - let's say each statement has five options - then each is a string of 1/2/3/4/5 or A/B/C/D/E one hundred characters long. The first thing you need is an appropriate model like this. This is technically called a sample space.

Once you have an appropriate sample space for your problem, then you can start doing some hypothesis testing. That means to you to (precisely) frame a hypothesis to test!
 
  • Like
Likes   Reactions: Eturnal and Dale
Eturnal said:
TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Wouldn't you expect that number to be lower if these disagreements were random?
I don’t think that we can answer this. The similarity measure doesn’t seem like a probability. So we can’t really have any expectations on what the similarity measure should be in unmeasured situations.
 
  • Like
Likes   Reactions: Agent Smith, FactChecker and Eturnal
Thanks for your reply. Having a hard time getting started on this problem and my math is rusty. The answer is not 87% of the
 
So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?
 
Eturnal said:
So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
 
  • Like
Likes   Reactions: Dale
PeroK said:
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
Tell me if I'm heading in the right direction here.

Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?
 
PeroK said:
This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!
Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!
 
  • #10
Eturnal said:
Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?
You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.
 
  • #11
PeroK said:
You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
 
  • #12
Eturnal said:
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
Being an inveterate frequentist, I'll leave that to the Bayesians!
 
  • Haha
Likes   Reactions: Dale
  • #13
PeroK said:
Being an inveterate frequentist, I'll leave that to the Bayesians!
It's challenging I know
 
  • #14
Eturnal said:
Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!
I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

Eturnal said:
Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.
I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
 
  • #15
Dale said:
I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
And if a Bayesian can't do it, then nobody can!
 
  • Like
Likes   Reactions: Dale
  • #16
PeroK said:
Being an inveterate frequentist, I'll leave that to the Bayesians!
I am a Bayesian, so I am happy to assign probabilities without a lot of data. But I still need an event space, just like the frequentists.
 
  • Like
Likes   Reactions: Agent Smith
  • #17
Dale said:
I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.
 
  • #18
Anyone want to try their hand at being amazing?
 
  • #19
I'm getting a bizarre result: ##261 = 211##
 
  • #20
Eturnal said:
Anyone want to try their hand at being amazing?
Oh we are all amazing here, but this is more a problem in combinatorics than probability or statistics.

And I am afraid you won't find an answer in combinatorics either, because there is not enough information there to provide a unique answer. To demonstrate this without using 10,000 word examples I will use something simpler:

  • Strings are ordered sequences of 10 letters.
  • 70% of the letters of string A are identical to the corresponding letters of string Z.
  • 70% of the letters of string B are identical to the corresponding letters of string Z.
  • 80% of the letters of string A are identical to the corresponding letters of string B.

[code lang=Python title="Example 1"]
A = "ZZZZZZZAAA"
B = "ZZZZZZZABB"
Z = "ZZZZZZZZZZ"
[/code]
Here when A disagrees with B (in the 9th and 10th positions), neither A nor B ever agrees with Z. Now consider

[code lang=Python title="Example 2"]
A = "ZZZZZZAAAZ"
B = "ZZZZZZAAZA"
Z = "ZZZZZZZZZZ"
[/code]
Here when A disagrees with B (again in the 9th and 10th positions), either A or B always agrees with Z.

Now consider the question in your opening post:

Eturnal said:
When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

The problem with this question is that you have already specified that the disagreements are not (or rather very unlikely to be) random by stating that
Eturnal said:
Text A & B are 92% similar

It is this number that you would expect to be smaller if the differences between texts A and Z and the differences between B and Z were "random", or rather uncorrelated which is the correct term here. In fact you would expect it to be around ## 0.87^2 ## or 76%.

Finally, I assume you are referring to religious texts here and you should note:
  • we don't discuss relegion here
  • you don't need statistics to tell you that it is unlikely that differences between different versions of similar texts are uncorrelated because the process that gives rise to these differences is clearly not random
  • the content of these posts is copyright PhysicsForums and you must not publish it elsewhere other than in accordance with this site's terms and conditions. In particular do NOT post on some relegious crackpot site that Science has proved that the codex sinaticus is the One True Word of God or whatever.
 
Last edited:
  • Haha
Likes   Reactions: Agent Smith

Similar threads

  • · Replies 21 ·
Replies
21
Views
2K
  • · Replies 17 ·
Replies
17
Views
4K
  • · Replies 33 ·
2
Replies
33
Views
8K
  • · Replies 12 ·
Replies
12
Views
2K
  • · Replies 13 ·
Replies
13
Views
7K
  • · Replies 4 ·
Replies
4
Views
3K
  • · Replies 10 ·
Replies
10
Views
6K
  • · Replies 13 ·
Replies
13
Views
10K
  • · Replies 25 ·
Replies
25
Views
8K
  • · Replies 7 ·
Replies
7
Views
4K