# Any good references on DNA profiling statistic methods?

I'm looking for papers or other references on DNA profiling statistics. Something comprehensive that covers the whole shebang.

I kind of troubled by some of the match probabilities being linked to "race" i.e. Caucasian.
I'm not even sure what Caucasian even means scientifically.
What happens when you combine a subjective measure to what is supposedly a random set?

Is the result still random?

It seems like the entire process of DNA profiling has probabilities involved then those proceeding probabilities (errors etc) are discarded when making the final judgment of say one in a million people will match some reference sample.

This final judgment is based on FBI tables which are subdivided into race.

## Answers and Replies

Related Set Theory, Logic, Probability, Statistics News on Phys.org
Ok ... kind of disturbing that no on can answer this.

How bout this.
Does anyone have an opinion on the mathematical rigor of DNA profiling?

WWGD
Science Advisor
Gold Member
2019 Award
Why don't you give people some more time, it is just 2 days old.

WWGD
Science Advisor
Gold Member
2019 Award
I'm looking for papers or other references on DNA profiling statistics. Something comprehensive that covers the whole shebang.

I kind of troubled by some of the match probabilities being linked to "race" i.e. Caucasian.
I'm not even sure what Caucasian even means scientifically.
What happens when you combine a subjective measure to what is supposedly a random set?

Is the result still random?

It seems like the entire process of DNA profiling has probabilities involved then those proceeding probabilities (errors etc) are discarded when making the final judgment of say one in a million people will match some reference sample.

This final judgment is based on FBI tables which are subdivided into race.
Are you referring or looking for fallacies and biases in profiling? Then there is the fallacy of using DNA matching without considering the context: the match should be considered only from people who are considered suspect.

I don't want to get into a specific instance.
I will spell out the issue.
Here's a statement from a DNA report

The probability of randomly selecting an unrelated individual
that could have contributed to this mixture is 1 person in 1
billion in the Caucasian and Southwestern Hispanic populations
and 1 person in 2 billion in the African American and
Southeastern Hispanic populations
.
These conclusions are based on population statistics derived from
a database of unrelated Caucasian, African American, Southeastern
Hispanic and Southwestern Hispanic populations obtained from the
Federal Bureau of Investigation (FBI)

The problem is Caucasian, African American, Southeastern
Hispanic and Southwestern Hispanic are arbitrary. They are social constructs i.e. there is no "race" test they can give.

What is the effect of an arbitrary choice on the statistics.
Say they wanted to profile pizza lovers. Now there is no correlation between pizza and DNA (i think :D)
However you can still generate a table and probabilities out of this arbitrary choice and start convicting people by saying in a DNA report

The probability of randomly selecting an unrelated individual
that could have contributed to this mixture is 1 person in 1
billion in the pizza lover population.

.
These conclusions are based on food preference derived from
a database of unrelated food eaters from the
Federal Bureau of Investigation (FBI).

A couple of take-out slips in evidence of suspect ordering pizza and your golden.

The bottom line is what effect does arbitrary choice have on the presumption of randomness. It reminds of of post-selection in QP in a way.

Also what does CUMULATIVE error rates in the physical processes i.e. PCR, mixed samples, electrophoresis, analyzer sensitivities to dye fluorescence do to the final probability?

All I can really find are things from law enforcement and gov't.
I just want like maybe a primer that joins the math to the process of actually doing it.

It seems to me there are alot of assumptions and

.

WWGD
Science Advisor
Gold Member
2019 Award
I don't want to get into a specific instance.
I will spell out the issue.
Here's a statement from a DNA report

The probability of randomly selecting an unrelated individual
that could have contributed to this mixture is 1 person in 1
billion in the Caucasian and Southwestern Hispanic populations
and 1 person in 2 billion in the African American and
Southeastern Hispanic populations
.
These conclusions are based on population statistics derived from
a database of unrelated Caucasian, African American, Southeastern
Hispanic and Southwestern Hispanic populations obtained from the
Federal Bureau of Investigation (FBI)

The problem is Caucasian, African American, Southeastern
Hispanic and Southwestern Hispanic are arbitrary. They are social constructs i.e. there is no "race" test they can give.

What is the effect of an arbitrary choice on the statistics.
Say they wanted to profile pizza lovers. Now there is no correlation between pizza and DNA (i think :D)
However you can still generate a table and probabilities out of this arbitrary choice and start convicting people by saying in a DNA report

The probability of randomly selecting an unrelated individual
that could have contributed to this mixture is 1 person in 1
billion in the pizza lover population.

.
These conclusions are based on food preference derived from
a database of unrelated food eaters from the
Federal Bureau of Investigation (FBI).

A couple of take-out slips in evidence of suspect ordering pizza and your golden.

The bottom line is what effect does arbitrary choice have on the presumption of randomness. It reminds of of post-selection in QP in a way.

Also what does CUMULATIVE error rates in the physical processes i.e. PCR, mixed samples, electrophoresis, analyzer sensitivities to dye fluorescence do to the final probability?

All I can really find are things from law enforcement and gov't.
I just want like maybe a primer that joins the math to the process of actually doing it.

It seems to me there are alot of assumptions and

.
Sorry I can't think of anything other than searching for info in the FBI's database. Maybe they have obtained this data from studying separate 'populations' ( where by 'population' I refer to the construct used by the FBI ). As to accumulated errors, I would hope many samples are taken from the same person and a test is made for each
to avoid mistakes. I don't see how to say something more specific without knowing the details of how the samples are taken and the analysis is done, sorry.

mfb
Mentor
What is the effect of an arbitrary choice on the statistics.
As long as the choice is consistent, this is no problem.
If (!) there is no correlation between DNA and love of pizza, you'll get the same number for both groups (within statistical uncertainties), therefore you cannot make any statement about pizza consumption just based on the DNA.