Tough Textual Criticism math statistics question

Eturnal · Dec 16, 2023

Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!

FactChecker · Dec 16, 2023

If this is a homework-type problem then there is a specific format for that. You have to show some work and then we can give hints and guidance. Can you express each problem statement in terms of conditional probabilities or other formal mathematical expressions?

PeroK · Dec 17, 2023

Eturnal said:

TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Okay let me rephrase this math question and frame it. It is math dealing with ancient Biblical texts and textual criticism.
.
Codex 01 (350AD) agrees with the MT (mjority text) about 87% of the time.

Codex 03 (350AD) Agrees with MT about 87% of the time.

01 and 03 agree with each other about 92% of the time.

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

Please help math geniuses. Thank you!

I can't make complete sense of what you are assuming and what you want to calculate based on those assumptions. There is also, perhaps, a missing factor about how many options are there for a text, which deteremines the underlying probability that two random texts will agree on something? Finally, you are more in the realm of statistical inference (hypothesis testing) here than in simple probabilities. More detail on this follows:

It seems you could model the situation by considering each text to be a list of things - possibly statements in this case. Perhaps each text makes one hundred statements about some things. If these are true/false statements, then each text is modelled by a binary string one hundred characters long. But, if these statements have more options than just true/false or A/B - let's say each statement has five options - then each is a string of 1/2/3/4/5 or A/B/C/D/E one hundred characters long. The first thing you need is an appropriate model like this. This is technically called a sample space.

Once you have an appropriate sample space for your problem, then you can start doing some hypothesis testing. That means to you to (precisely) frame a hypothesis to test!

Dale · Dec 17, 2023

Eturnal said:

TL;DR Summary: If text A is 87% similar to text Z

And text B is 87% similar to text Z

Text A & B are 92% similar

Each text is 10000 words (approx)
What are the odds that when Text A and Text B disagree one of them will agree with Text Z?

Wouldn't you expect that number to be lower if these disagreements were random?

I don’t think that we can answer this. The similarity measure doesn’t seem like a probability. So we can’t really have any expectations on what the similarity measure should be in unmeasured situations.

Eturnal · Dec 17, 2023

Thanks for your reply. Having a hard time getting started on this problem and my math is rusty. The answer is not 87% of the

Eturnal · Dec 17, 2023

So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?

PeroK · Dec 17, 2023

Eturnal said:

So there are about 7,000 words in each text in the texted section.
.
but we should be able to do the percentages so that shouldn't matter right? We have 8% of each text agreeing to disagree. Then we take a random splatter of 26% between the two texts (13% each text). But oh yeah we would have to account for all the letter/ word possibilities in those slots as well wouldn't we?

This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!

Eturnal · Dec 17, 2023

PeroK said:

This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!

Tell me if I'm heading in the right direction here.

Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?

Eturnal · Dec 17, 2023

PeroK said:

This is even less clear than your original post. Mathematics and statistics require a well-defined problem - even if the definition includes uncertainties and probabilities. This is different from the humanities where you can argue endlessly over ill-defined concepts!

Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!

PeroK · Dec 17, 2023

Eturnal said:

Take two texts of 100 characters and highlight 8% representing the disagreement between 01 and 03.

Now randomly select 13% of each text representing the disagreements between 01/03 & the MT.

What are the odds those 13 places overlap the 8%?

Now, should we account for each character having 24 different options (letters in the Greek alphabet)? Or should we just pretend it is an A/B in eaxh character slot for simplicity sake?

You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.

Eturnal · Dec 17, 2023

PeroK said:

You can only produce probabilities from a large sample of data or by understanding where the source material came from and hence you have some underlying assumptions.

It's a common misconception that probabilities can be conjured from one sample of data without underlying assumptions. I think this is what you trying to do here. That somehow, these percentages themselves will reveal something mathematically robust.

You can determine from the texts how correlated they are (and, indeed, that's what your percentages are trying to show). But, there is no magic wand that will tell you how likely that correlation was. The probability of a given correlation is not inherent in the data. It can only be calculated when you have a model for how the data was generated. The same correlations might be almost inevitable in one case and highly unlikely in another - even in cases where the raw data is the same.

Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.

PeroK · Dec 17, 2023

Eturnal said:

Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.

Being an inveterate frequentist, I'll leave that to the Bayesians!

Eturnal · Dec 17, 2023

PeroK said:

Being an inveterate frequentist, I'll leave that to the Bayesians!

It's challenging I know

Dale · Dec 17, 2023

Eturnal said:

Someone posited that there is an 87% chance that the disagreements between 01 and 03 land on MT just because each agrees with MT 87%. I don't feel like their math is correct. Thank you for your help!!!

I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

Eturnal said:

Humor me. I'm sure someone can draw up a useful piece of math on this although yes everything will have some assumptions plugged in.

I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.

PeroK · Dec 17, 2023

Dale said:

I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.

And if a Bayesian can't do it, then nobody can!

Dale · Dec 17, 2023

PeroK said:

Being an inveterate frequentist, I'll leave that to the Bayesians!

I am a Bayesian, so I am happy to assign probabilities without a lot of data. But I still need an event space, just like the frequentists.

Eturnal · Dec 17, 2023

Dale said:

I don’t think that these percentages for agreement are probabilities. Probabilities are between zero and one, so it is common to write them as percentages. But that doesn’t imply that everything that is written as a percentage is a probability.

In particular, a probability is always a measure on some space of events. For example, if you are rolling a single dice then the space of events could be “a 1 is rolled”, “a 2 is rolled”, …, “a 6 is rolled”.

Here, I cannot see that there is a space of events. So I don’t think that “text A is 87% similar to text Z” is a probability. If it is a probability then what exactly is the event space and what sample of events is described by the statement?

I don’t think that will be possible without some additional information about the similarity measure. It doesn’t seem like a probability to me. So I don’t think the math of probability will apply.

Eturnal · Dec 17, 2023

Anyone want to try their hand at being amazing?

Agent Smith · Oct 13, 2024

I'm getting a bizarre result: ##261 = 211##

pbuk · Oct 14, 2024

Eturnal said:

Anyone want to try their hand at being amazing?

Oh we are all amazing here, but this is more a problem in combinatorics than probability or statistics.

And I am afraid you won't find an answer in combinatorics either, because there is not enough information there to provide a unique answer. To demonstrate this without using 10,000 word examples I will use something simpler:

Strings are ordered sequences of 10 letters.
70% of the letters of string A are identical to the corresponding letters of string Z.
70% of the letters of string B are identical to the corresponding letters of string Z.
80% of the letters of string A are identical to the corresponding letters of string B.

[code lang=Python title="Example 1"]
A = "ZZZZZZZAAA"
B = "ZZZZZZZABB"
Z = "ZZZZZZZZZZ"
[/code]
Here when A disagrees with B (in the 9th and 10th positions), neither A nor B ever agrees with Z. Now consider

[code lang=Python title="Example 2"]
A = "ZZZZZZAAAZ"
B = "ZZZZZZAAZA"
Z = "ZZZZZZZZZZ"
[/code]
Here when A disagrees with B (again in the 9th and 10th positions), either A or B always agrees with Z.

Now consider the question in your opening post:

Eturnal said:

When 01 and 03 disagree, there is an 87% chance that one of them agrees with the MT.

Wouldn't you expect that number to be lower if these disagreements were random?

The problem with this question is that you have already specified that the disagreements are not (or rather very unlikely to be) random by stating that

Eturnal said:

Text A & B are 92% similar

It is this number that you would expect to be smaller if the differences between texts A and Z and the differences between B and Z were "random", or rather uncorrelated which is the correct term here. In fact you would expect it to be around ## 0.87^2 ## or 76%.

Finally, I assume you are referring to religious texts here and you should note:

we don't discuss relegion here
you don't need statistics to tell you that it is unlikely that differences between different versions of similar texts are uncorrelated because the process that gives rise to these differences is clearly not random
the content of these posts is copyright PhysicsForums and you must not publish it elsewhere other than in accordance with this site's terms and conditions. In particular do NOT post on some relegious crackpot site that Science has proved that the codex sinaticus is the One True Word of God or whatever.

Tough Textual Criticism math statistics question

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight