Introduction
Small open reading frames (sORFs) can be defined as open reading frames smaller than or equal to 300 nucleotides (100 amino acids). These “sORFs”, while inherent to all genomes, are historically ignored in gene annotation studies, stating that these lack any coding potential. Exclusion of these sORFs has emerged as a side effect during the development of different (gene prediction) tools in the field of bioinformatics/genomics/proteomics trying to reduce noise, imposed by technological limitations However, recent scientific breakthroughs discovered coding potential of several sORFs with clinical significance, indicating their importance.
1, 2, 4 . In particular, the advent of ribosome profiling
5 (RIBO-seq), a next generation deep sequencing technique, providing a genome-wide snapshot of the translating machinery in a cell, provided evidence of translation in sORFs. The value and importance of sORFs is becoming widely recognized
6, 7 furthermore ribosome profiling data is becoming more abundant. The creation of a public repository for sORFs, providing information resulting from various tools and metrics, seems a necessity in aiding functional research in the micropeptide field.
What does the database hold:
With this in mind, we like to introduce sORF.org, a public repository for sORFs. The main purpose is to allow researchers to examine individual sORFs or to perform searches based on several criteria for further large-scale studies. Different data sources, both experimental and
in silico(based on various bioinformatics tools), are collected. sORF.org currently holds 333970 sORFs across three different species (human, mouse and fruit fly), derived from multiple RIBO-seq experiments and is expanding as more data becomes available. Available datasets can be inspected
HERE.