Comparing DNA. What does it mean?

  • Thread starter Thread starter Goodies
  • Start date Start date
  • Tags Tags
    Dna Mean
AI Thread Summary
The discussion centers on the complexities of comparing DNA between species, particularly the claim that humans share 50% of their DNA with bananas. It highlights that such percentages often refer to protein-coding genes rather than the entire genome, which can be misleading due to the significant amount of non-coding "junk" DNA. The conversation emphasizes that the definition of genetic similarity varies based on the methods used for comparison, with factors like chromosomal location and gene regulation playing crucial roles. Additionally, it points out that the number of genes does not directly correlate with an organism's complexity, challenging simplistic interpretations of genetic relatedness. Overall, the nuances of DNA comparison reveal that simplistic statements about genetic similarity lack significant meaning without context.
Goodies
Messages
10
Reaction score
1
When I see something like "Humans and Bananas share 50% the same DNA with one another!", I have several questions.

First of all, a banana has 530 million base pairs whereas a human has approximately 3 billion. Even if we took the first 530 billion base pairs of the human genome, this would only be about 17% of the size of the human genome. Obviously this is not at all how the gene similarities are measured. I would assume that it is talking about the similarities of protein-coding genes but if someone would be able to elaborate on the comparison techniques of DNA I'd be very appreciative!
 
Biology news on Phys.org
You are correct with your assumption. The reference is to protein coding nucleotides and not the entire genomone of each.

In the past the non coding segements of DNA were considered "junk", with the implication being that no consideration should be given to these components. Hence DNA = protein coding sequences.
 
Thank you very much for that simple, quick answer! To clarify, is it necessarily correct that 50% of the banana's protein-coding genes are also present in humans? Or is there another comparison mechanism that is used?
 
One would expect you to do a simple Google search. This comes up - http://message.snopes.com/showthread.php?t=51513

Any reference to a "percent similarity" between two species has all kinds of problems.

Like Nick said, if you just pick a random place on both genomes and start comparing "sequences" you'll get about a 25% match.

But DNA "sequence" is much more than just the pattern of A,T,G and C. The chromosomal location, nearby genes, methylation, amount of "junk" DNA in the vicinity and a bunch of other things contributes to what a particular sequence actually does (which is a much more relevant thing to compare than just raw sequence).

The genomes of most higher species is roughly 90% "junk". That "junk" is very poorly conserved across species. "Non-junk" DNA, that actually codes for a protein or acts as a regulatory element, is much more highly conserved across species but at only ~10% of the genome it really doesn't contribute all that much to the overall similarity.

You often see quotes saying how similar the DNA is between two species. In most situations the number is meaningless since how similar the genomes are depends on how you define similarity. There are a couple ways to define similarity and each method is used to examine different aspects of the DNA sequences. Quotes like "a chimp's DNA is 95% identical to humans" is very misleading. That particular comparison was almost certainly done on just the coding sequences, which is only ~10% of the genome. The noncoding DNA is much less conserved across species, and even between individuals of the same species.

If you abstract the comparison one more level, to protein sequence, the similarities between species increases. Indeed I suspect most comparisons you see between humans and other primates is actually at the protein level.

But back at the genome wide level the similarities aren't all that great and measuring the similarity at that level really doesn't have all that much value.

Another thing to consider when comparing genomes is that nobody really knows the relationship between the number of genes and the "complexity" of the organism. Or, how a small change in a single gene can significantly change the gene's behavior, and potentially the complexity of the organism. The simple minded thought that "complexity" is a linear function of the number of genes is certainly wrong. Even though scientists that should know better often treat gene number as being a linear measure of complexity. When the human genome was nearing initial completion there were a fair number of scientists that thought there was something seriously wrong with the methods used because the number of genes was turning out to only be ~30,000. That isn't all that many, something like 6x more than bacteria. "Certainly humans are more than 6x more complex than bacteria" was a fairly common thought but it is erroneous because gene number is not a valid measure of the organism's complexity.

The fact is "complexity" isn't linear in the number of genes and isn't even constant for a fixed number of genes. So even if humans and bananas share 50% of their sequences that does not really say anything about the actual relatedness, or the relative complexity, of the two species.

"Relatedness" can be determined by comparing the genomes of two species but that is a much more complex analysis than simply "we share 50% of our DNA with bananas." Any quote like "we share XX% of our DNA with {insert species name}" really has no significance, especially if you don't know what was actually being compared.
Reply With Quote

****So it is not the number of chromosomes or nucleotides or proteins that determine the complex of an organism. It is something else.****

****But even more interesting is that the same stuff is used over and over again, in all kinds of organisms. With random this's and that's, one would expect more diversity. Not so!"***
 
In addition to what has already been said, similarity between DNA sequences does not necessarily even need to mean that the sequences are identical. The definition of similarity is really up to the person reporting it. There are many factors that go into deciding on a scoring function to determine similarity.

Many amino acids can be exchanged with similar amino acids without much effect on protein function (isoleucine and leucine, for example) such that even if the code gives a different amino acid, the resulting protein can still be 'similar' in chemical function. Another thing, proteins can have a very low level of similarity between two species overall, but the actual structure of the protein can nevertheless be conserved. There are many proteins that have a low sequence similarity and yet crystal-structures of the proteins have revealed them to be very similar.

Going back to the definition of 'similarity'... In terms of actually comparing sequences, nobody has agreed on a definitive "Scoring Function" to compare sequence similarity. What helps define similarity between one set of species may not work when comparing a set of other species. In bioinformatics, the amount of "similarity" you get really depends on the scoring function you set up to compute sequence identity between two species. Whether this actually reveals the true nature of evolutionary history or homology requires close scrutiny.
 
  • Like
Likes 1 person
Chagas disease, long considered only a threat abroad, is established in California and the Southern U.S. According to articles in the Los Angeles Times, "Chagas disease, long considered only a threat abroad, is established in California and the Southern U.S.", and "Kissing bugs bring deadly disease to California". LA Times requires a subscription. Related article -...
I am reading Nicholas Wade's book A Troublesome Inheritance. Please let's not make this thread a critique about the merits or demerits of the book. This thread is my attempt to understanding the evidence that Natural Selection in the human genome was recent and regional. On Page 103 of A Troublesome Inheritance, Wade writes the following: "The regional nature of selection was first made evident in a genomewide scan undertaken by Jonathan Pritchard, a population geneticist at the...
I use ethanol for cleaning glassware and resin 3D prints. The glassware is sometimes used for food. If possible, I'd prefer to only keep one grade of ethanol on hand. I've made sugar mash, but that is hardly the least expensive feedstock for ethanol. I had given some thought to using wheat flour, and for this I would need a source for amylase enzyme (relevant data, but not the core question). I am now considering animal feed that I have access to for 20 cents per pound. This is a...
Back
Top