1. May 10, 2015

### Suraj M

Could someone tell me what exactly is an open reading frame.(ORF)
What i know about it is that it starts its sequence with ATG, codes for amino acids and ends with a terminator codon, what does that mean? is it like an mRNA(can't be as it has $T$) or part of the DNA which will form the mRNA(cistron), or direct DNA to protein synthesis(unlikely)?
Thank you

2. May 10, 2015

### Ygggdrasil

The concept of an open reading frame comes from the early days of genome sequencing when we had the full DNA sequence of an organism like a bacteria, and we wanted to figure out where the protein-coding genes were. One feature of proteins is that they contain an open reading frame, a DNA sequence beginning with ATG followed by a long stretch of amino acid-coding codons, and ending with a stop codon. Because three of the 64 codons are stop codons, a random sequence of DNA would have open reading frames only ~ 20 codons long. Thus, open reading frames with hundreds of codons are statistically unlikely to occur by chance and are putatively annotated as protein-coding genes.

Does every ORF encode a protein? No. As mentioned above, short ORFs can occur by chance in the genome, so while there are short ORFs encoding things like peptide hormones, many short ORFs do not encode proteins. Do all protein-encoding genes contain ORFs? The answer is again no. For example, in many species, there are non-canonical start sites that do not begin with ATG so these ORFs might not be recognized. Furthermore, the presence of introns in eukaryotic genes can make the identification of ORFs tricky.

Modern means of identifying protein coding regions extend beyond the concept of ORFs and make use of other sources of information (such as identifying transcription and translation start sites, looking for splicing sites in eukarotes, and making use of evolutionary information, for example, whether the ORF is conserved across species).

3. May 11, 2015

### Suraj M

in prokaryotes, how is it different from a transcription unit?

4. May 11, 2015

### Ygggdrasil

The transcription unit, which is the segment of DNA extending from the transcription start site of a gene to the transcription stop site. It contains the ORF as well as other regulatory sequences upstream and downstream of the ORF that are transcribed but not translated into protein. These regulatory regions of the transcription unit encode the 5' untranslated region (UTR) and 3' UTR of the mRNA, and these regions are important for regulating translation and mRNA stability. For example, prokaryotes require a ribosome binding sequence (the Shine-Dalgarno sequence) in the 5'UTR in order to allow the ribosome to bind to the mRNA and initiate transcription. Many eukaryotic mRNAs contain micro-RNA (miRNA) binding sites in their 3'UTRs that regulate the mRNAs, for example, by degrading them.

5. May 12, 2015

### Suraj M

This might be a stupid question after all your
explanation , if its just a reading frame why is it called an OPEN reading frame? Just curious
So ORFs are not of any use now? after identification of transcription units?

6. May 12, 2015

### Ygggdrasil

I'm not sure why it's called an open reading frame, but if I were to guess, it's because it contains a long expanse of codons lacking a stop codon. The concept of an ORF is different from the concept of a reading frame, because an ORF refers to a specific segment within one reading frame.

The concept of an ORF is still useful now and it is a term that is still used often in the literature. At least in the fields I work in, I hear the term ORF much more than I hear the term transcription unit (possibly because biology tends to be more protein centric and the ORF is the portion of a gene that corresponds to the protein).

7. May 13, 2015

### Suraj M

8. May 13, 2015

### Ygggdrasil

Well, on average you would expect to see a stop codon every ~ 20 codons, but in most ORFs you'll have stretches of much greater than 20 codons w/o a stop codon (there is a stop codon at the end but none in the middle).

9. May 13, 2015

### Suraj M

So they are just very long, because of the no. of amino acids in the protein?

10. May 13, 2015

### Ygggdrasil

11. Jul 10, 2015

### stabu

I'd like to venture another possible explanation for the word "open" in the term ORF. Again, it's a bit of a guess, but it's useful because it highlights something not mentioned here. Start codons can also be methionines, so the frames are open at the start end, not the stop-codon end.

There's more certainty attached to the end of the ORF, because stop codons are (at least: tend to be) unambiguous. However, how do we know the codon is methionine, or whether it's a start codon? In that way, you can say the RF is open, and therefore a proper ORF.

Sure, if you have an ORF strongly associated with a certain protein or transcription unit. You might tend to say, "well it's closed now, because we have strong evidence from the protein side, that tells us where the start codon is". However, in those circumstances, you have typically advanced further, and left the ORF behind.

To my detriment, I should also say that trying to attach word-by-word meaning to genetics terms is not often a very good idea.

12. Jul 10, 2015

### stabu

I also noticed there is a thread coming up as being associated with this one, called something like "Open Reading Frame?" It cannot be be added to any longer, but in it, is stated that start and end codons are associated with exons. That is incorrect and will lead to severe inference problems in eukaryotes. Maybe not so much in prokaryotes where the exon concept isn't necessary, due to lack of alternative splicing.

13. Jul 12, 2015

### stabu

OK, I want to suppress my interpretation of the word "open" here. As you recall, I thought it may refer to the open start-end of a reading frame.

From reading around, it appears that this is unlikely. Doolittle's 1986 book, "On URFs and ORFs" is probably an early enough reference and it correlates with Ygggdrasil's interpretation.

The point is, the word "open" has various meanings. It does not necessarily mean "with an open end" or "open-ended" which is what i was aiming for. Often "open" means "unresolved" as in "open for investigation". So we get a long stretch of codons without a stop codon, and we realise it's statistically unlikely so we classify the region as open, in that, it's a candidate for a possible gene.

I am however, still a bit mystified as to why there is not much talk about deciding when an M is a start codon, but that's may be because I'm not looking in the right places.

14. Jul 12, 2015

### Ygggdrasil

There are some context clues surrounding the ATG to determine whether it is a start codon or not. For example, in bacteria, true start codons are often directly downstream of a Shine-Dalgarno sequence (which directs ribosome binding for translation initiation). Similarly, in eukaryotes, true start codons often fall within a Kozak consensus sequence. Of course, methods of identifying start codons are not always perfect and new experimental techniques are often finding translation on parts of mRNAs that we did not think to be translated previously (http://www.sciencedirect.com/science/article/pii/S2211124714006299).

15. Jul 13, 2015