# Is DNA unique?

N468989
Hello, how unique is DNA? What i mean is can there exist or ever exist two people with the same DNA? let's imagine there are "only" 100 trillion codes, after this number they will start repeating. Is this correct?

Also, i was wondering if the DNA collected from a person at 5yrs and 90years of age is the same, i guess that it remains very simlilar buts changes with age.

With cloning, do the cloned have the same DNA as the original? (identical twins also apply?)

thanks

Homework Helper
Hello, how unique is DNA? What i mean is can there exist or ever exist two people with the same DNA? let's imagine there are "only" 100 trillion codes, after this number they will start repeating. Is this correct?

If there were only 100 trillion codes and there were more than 100 trillion individuals, at least two would have the same codes.

In fact, thanks to the birthday paradox, if there were only 100 trillion codes you'd expect to get repeats at around sqrt(100 trillion) = 10 million people.

But humans have about 3 billion base pairs, for a total of about 4^3000000000 different DNA strands of human size. To get a repeat (if selecting uniformly at random) you'd need around 2^3000000000 people. (In all time there will never be nearly this many people.)

With cloning, do the cloned have the same DNA as the original? (identical twins also apply?)

These would be identical other than replication errors.

Phrak
If there were only 100 trillion codes and there were more than 100 trillion individuals, at least two would have the same codes.

In fact, thanks to the birthday paradox, if there were only 100 trillion codes you'd expect to get repeats at around sqrt(100 trillion) = 10 million people.

But humans have about 3 billion base pairs, for a total of about 4^3000000000 different DNA strands of human size. To get a repeat (if selecting uniformly at random) you'd need around 2^3000000000 people. (In all time there will never be nearly this many people.)

But not all sequences of base pairs code to anything useful. So we should be counting viable genomes instead--or more accurately, genome pair combinations that are both viable and expressed, and factoring these. Do you have any numbers for these? Or does anybody for that matter?

Homework Helper
But not all sequences of base pairs code to anything useful.

Right, that's why I wrote "DNA strands of human size" rather than "human DNA strands".

So we should be counting viable genomes instead--or more accurately, genome pair combinations that are both viable and expressed, and factoring these.

I would have interpreted the OP's intent more like "viable human DNA strands" rather than "functionally different human DNA strands", but I don't have numbers on either.

Do you have any numbers for these? Or does anybody for that matter?

No, but I was hoping someone here who actually knows about this stuff would post here with some rough estimates.

Phrak
No, but I was hoping someone here who actually knows about this stuff would post here with some rough estimates.

You and me both

Staff Emeritus
Gold Member
I don't think anyone could give a good estimate which part of the DNA is functional and which part is not. Over the years we have found functions for regions that were otherwise considered 'junk'.

You must also take into account that the actual DNA code is only one layer of information. Above that layer is something we call epigenetics: a code on top of the DNA, made up by histone modifications.

http://www.sciencemag.org/cgi/content/abstract/293/5532/1074" [Broken]
Chromatin, the physiological template of all eukaryotic genetic information, is subject to a diverse array of posttranslational modifications that largely impinge on histone amino termini, thereby regulating access to the underlying DNA. Distinct histone amino-terminal modifications can generate synergistic or antagonistic interaction affinities for chromatin-associated proteins, which in turn dictate dynamic transitions between transcriptionally active or transcriptionally silent chromatin states. The combinatorial nature of histone amino-terminal modifications thus reveals a "histone code" that considerably extends the information potential of the genetic code. We propose that this epigenetic marking system represents a fundamental regulatory mechanism that has an impact on most, if not all, chromatin-templated processes, with far-reaching consequences for cell fate decisions and both normal and pathological development.

Last edited by a moderator:
Homework Helper
I don't think anyone could give a good estimate which part of the DNA is functional and which part is not. Over the years we have found functions for regions that were otherwise considered 'junk'.

But we *can* give results in the other direction, right? My understanding is that more than 1% of the base pairs in human DNA are involved in gene coding, so by any definition we have less than 99% junk/noncoding DNA. That gives a lower bound of 30 million functional base pairs, or 4^30000000 possibilities.

Phrak
You must also take into account that the actual DNA code is only one layer of information. Above that layer is something we call epigenetics: a code on top of the DNA, made up by histone modifications.

http://www.sciencemag.org/cgi/content/abstract/293/5532/1074" [Broken]...

Something has to regulate how veryious different cells operate. Not that I understand one complete scentence in the above quote. If histones are mainly responsible for cell diversification, then perhaps they don't have a lot to do, if anything with individual people.

Last edited by a moderator:
BoomBoom
Hello, how unique is DNA? What i mean is can there exist or ever exist two people with the same DNA? let's imagine there are "only" 100 trillion codes, after this number they will start repeating. Is this correct?

Also, i was wondering if the DNA collected from a person at 5yrs and 90years of age is the same, i guess that it remains very simlilar buts changes with age.

With cloning, do the cloned have the same DNA as the original? (identical twins also apply?)

You kind of answered your 1st question with your last question. Identical twins and clones would have an identical copy of each others DNA code.

Chances are nearly impossible for this to occur in two individuals of independent descent I would think. You could try and calculate the odds using math, but this would not be accurate since your genetic code is not just a craps roll of a certain number of bases, it is a combination of the code from your mother and father. (So, maybe if the same mother and father had a trillion babies?)

As for the age thing, you would still have the same code from when you are young or old (minus a few accumulated mutations here and there), but that code is expressed differently...which goes into epigenetics as Monique was mentioning. Epigenetics is also the reason why identical twins are different even though their code is the same, and this would also apply to a clone. It would be impossible to have an exact identical copy of a person as the movies often portray.

Phrak
You could try and calculate the odds using math, but this would not be accurate since your genetic code is not just a craps roll of a certain number of bases, it is a combination of the code from your mother and father.

How, again, does the last nullify the first?

Moridin
I don't think anyone could give a good estimate which part of the DNA is functional and which part is not. Over the years we have found functions for regions that were otherwise considered 'junk'.

I thought i would just throw this in for clarification. When one is using the colloquial and rather unfortunate term junk DNA today one is usually referring not to DNA that is actually non-functional, but to non-coding DNA, which of course can be functional or non-functional. Similarly, when we are using terms like big bang or selfish genes, we are not actually claiming that the universe was big and loud in the past or speculating about the secret mental life of genes. Naturally, all of these concepts have been, and continue to be, greatly abused in popular science.

Homework Helper
You could try and calculate the odds using math, but this would not be accurate since your genetic code is not just a craps roll of a certain number of bases, it is a combination of the code from your mother and father.

This seriously limits the number of outcomes. Humans have 23 pairs of chromosomes, so a person can produce only 2^23 distinct gametes.* Therefore a given mother and father could produce only 2^46 distinct gene patterns (modulo transcription errors and mutations).

* Assuming each chromosome is distinct, which is highly likely.

(So, maybe if the same mother and father had a trillion babies?)

After about sqrt(2^46) = 2^23 children or so, you'd expect a repeat. This is small enough that the constant factor 1.177... can be noticed! After 9,876,831 children (not including identical twins), a given mother and father have a 50% chance of having produced two genetically identical children (not including mutations or transcription errors).

BoomBoom
How, again, does the last nullify the first?

A mother and father are going to produce offspring that are a combination of their two genomes...or 4 sets of chromosomes rather. Crossover will produce a variation based on combinations of each parents' two sets of chromosomes. The variation of genetic code would be different among another set of parents. So two different sets of parents could never produce offspring with identical genetic codes within the genome of their opffspring.

So, in that sense, it is not like saying "a human has X number of bases, which makes Y number of possible combinations, so the chances are Z that the same code would appear twice". Many combinations would be impossible with a certain set of parents.

In other words, the chance is so astronomically small, that it is essentially 0%.

People aren't made from scratch. We are assembled from our parents' pre-existing bleuprints.

BoomBoom
Humans have 23 pairs of chromosomes, so a person can produce only 2^23 distinct gametes.* Therefore a given mother and father could produce only 2^46 distinct gene patterns (modulo transcription errors and mutations).

You are neglecting crossover. There is much more variation than what you suggest.

Homework Helper
You are neglecting crossover. There is much more variation than what you suggest.

I included crossover with the transcription errors (rightly or wrongly). But on the whole, I don't think 70 trillion is that small -- I mean, having 20 kids would be really unusual today, and the chance of two 'identical twins' being produced through chance at that size is one in 370 billion, small enough that it's probably not happened to anyone on Earth (even if everyone had families this large!).

Phrak
GH, we can use your number of base pairs at at 3^9. BTW, is this the full set or half?

None of the objections I've heard amount to anything. And crossover actually randomizes things nicely so we don't have to partition probabilities by chromosomes as long as we are counting by genomes.

We need two more numbers. An average number of base pairs per genome, and the percentage of base pairs that aren't so-called trash.

I don't know any better, so I'll assume most genomes code for a nominally sized protien molecule of 150(?) amino acids--or 300 DNA base pairs. Is it two base pairs per amino acid? Call the average number of base pairs to code a molecule = N. What do you think?

I think there is less trash DNA than people might suppose that's used in fetal development. So 5% of DNA is used for something. M=0.05. What do you think of the 5% value?

Oh... One other number might be needed. There are 3 different combinations of 2 chromosome pairs given two possible genomes. On average there are perhaps 2.2 viable competing genomes. This is a number to pin down better, too. But it gets a little sticky here. Do we treat Pp the same as PP? That is do we care about actual genes, or expressed characteristics?

Last edited:
Homework Helper
And crossover actually randomizes things nicely so we don't have to partition probabilities by chromosomes as long as we are counting by genomes.

But the chance of any particular crossover is vastly lower than the chance of receiving an identical copy of a parent's chromosome -- in fact, it's more likely you'll get an unaltered copy than that any of the crossovers will occur. (I don't have hard numbers on this; anyone?) So if we care about the probability we should think about chromosomes.

Oh... One other number might be needed. There are 3 different combinations of 2 chromosome pairs given two possible genomes. On average there are perhaps 2.2 viable competing genomes. This is a number to pin down better, too. But it gets a little sticky here. Do we treat Pp the same as PP? That is do we care about actual genes, or expressed characteristics?

If you're throwing out noncoding DNA, you may as well lump Pp with PP (except in cases of incomplete dominance).

Staff Emeritus
Gold Member
But the chance of any particular crossover is vastly lower than the chance of receiving an identical copy of a parent's chromosome -- in fact, it's more likely you'll get an unaltered copy than that any of the crossovers will occur. (I don't have hard numbers on this; anyone?) So if we care about the probability we should think about chromosomes.
For humans it is about 2-3 cross-overs per homologous chromosomes (genetic distance is measured in cM, 1 cM is 1% recombination frequency).

Phrak
But the chance of any particular crossover is vastly lower than the chance of receiving an identical copy of a parent's chromosome -- in fact, it's more likely you'll get an unaltered copy than that any of the crossovers will occur. (I don't have hard numbers on this; anyone?) So if we care about the probability we should think about chromosomes.

What I have in mind is counting combinations of genomes, rather than chromosomes. So the counting is simplified when the probability of one genome occurring are not dependent upon the probability of another genome. Maybe my logic is goofy...

[/quote]If you're throwing out noncoding DNA, you may as well lump Pp with PP (except in cases of incomplete dominance).[/QUOTE]

There is a lot of gray area to contend with, isn't there?

In a more complete case, we'd have a set (P1, P2,...,Pn). Then we'd wamt to know the average value of n for all genomes. If we decide that the P characteristic is just as pronounced whether there are one or two P's it simplifies things. In that case there are two expressed types. Other times three. Should the weighted average be about 2.2?

Phrak
There's one other thing. We really have two populations to deal with. Oddly enough I'd forgotten there's a difference between men and women...