Gene Identification: How Are Genes Identified?

AI Thread Summary
Genes are sparsely located within DNA strands, with the human genome containing a few thousand genes among 3.5 billion bases. Identifying genes involves distinguishing them from non-coding regions, which make up the majority of DNA. Key indicators for genes include start and stop codons that signal the beginning and end of protein coding sequences. Geneticists utilize computational tools to predict potential genes, known as open reading frames (ORFs), by analyzing sequenced DNA. Programs like ORFinder help identify these ORFs, while BLAST allows for comparison against known protein databases. Further analysis of surrounding non-coding sequences is essential for confirming gene identity, followed by experimental validation to ensure that predicted proteins and RNA are produced. The discussion highlights the complexity and systematic nature of gene identification, likening DNA to a biological computer that organizes information through structured processes.
nesp
Messages
60
Reaction score
0
The way I understand it, genes are rather sparse within a strand of DNA. The human genome, for example, has a few thousand genes in a strand of 3.5 million A,C,G,T bases. I also understand that genes code for proteins through RNA but, again, most of the DNA is noncoding. My question is, how do we identify genes from non-genes? I've searched through google and found some useful sites, such as this one http://www.genomenewsnetwork.org/articles/06_00/sequence_primer.shtml

but they are overly complex for my question. I'm impressed that the entire genome can be decoded, but is there a simple explanation for how a geneticist identifies that this particulare section of DNA is a gene and that one is not?
 
Biology news on Phys.org
Here a few basic concept, every gene will have 3 base pair (i.e. a codon) that signal the start of protein. There several different start signal and those depend on the organism. There's also stop signal at the end of the gene that signal the last codon

http://en.wikipedia.org/wiki/Start_codon
http://en.wikipedia.org/wiki/Stop_codon#Start.2Fstop_codonsOnce you sequence and assemble a certain area, you can run programs that will predict potential gene and their location. These potential genes are called open reading frame (ORF).

ORFinder is an example
http://www.ncbi.nlm.nih.gov/gorf/gorf.html

Once you id a potential ORF, you can compare, using a computer program, the predicted protein sequence to a database of other predicted and/or sequenced proteins.

BLAST is such a program
http://www.ncbi.nlm.nih.gov/BLAST/

Once you think your prediction is good, you can then look at the sequence before and after the gene to see if there any marker. Non-coding sequence before and after the gene that are required for the gene to be transcribed into RNA. There are also important when you are trying to id a gene.

The last step you be to do experiemental work to confirm that the gene produce the predicted protein and the predicted RNA.
 
Thanks for that explanation and those links, they are very helpful. My field is mathematics, not biology, and if I didn't know what you were referring to, I would think you were describing a computer program, especially the start and stop signals. In essence, DNA is a biological computer. It's amazing the kind of order and logical structures that can be produced through randomness and chaos.
 
Deadly cattle screwworm parasite found in US patient. What to know. https://www.usatoday.com/story/news/health/2025/08/25/new-world-screwworm-human-case/85813010007/ Exclusive: U.S. confirms nation's first travel-associated human screwworm case connected to Central American outbreak https://www.reuters.com/business/environment/us-confirms-nations-first-travel-associated-human-screwworm-case-connected-2025-08-25/...
Chagas disease, long considered only a threat abroad, is established in California and the Southern U.S. According to articles in the Los Angeles Times, "Chagas disease, long considered only a threat abroad, is established in California and the Southern U.S.", and "Kissing bugs bring deadly disease to California". LA Times requires a subscription. Related article -...
I am reading Nicholas Wade's book A Troublesome Inheritance. Please let's not make this thread a critique about the merits or demerits of the book. This thread is my attempt to understanding the evidence that Natural Selection in the human genome was recent and regional. On Page 103 of A Troublesome Inheritance, Wade writes the following: "The regional nature of selection was first made evident in a genomewide scan undertaken by Jonathan Pritchard, a population geneticist at the...
Back
Top