Is it possible to find out how many bytes of data one chromosome holds?

  • Thread starter Char. Limit
  • Start date
  • Tags
    bytes Data
In summary: Dear Monique, a nucleus has far, far more data stored than 750 MB. The DNA backbone alone (not considering DNA methylation) has about 3.3 GB in its haploid form for human DNA (one byte being 2 bits of information A=T, G=C and T=A, C=G). Most of the information is stored in non-coding stretches (98%) which constitute the so-called regulatory network of information (the higher an organism, the more non-coding DNA stretches it carries in the genome).
  • #1
Char. Limit
Gold Member
1,222
22
Is it possible to find out how many bytes of data one chromosome (say, Chromosome 1 for humans) carries?

Does the question have any meaning?
 
Biology news on Phys.org
  • #2
That would depend on how you would define a 'byte' of data as it pertains to genomic information, but chromosome 1 has about 224 million base pairs.
 
  • #3
What kind of information are you counting? The DNA sequence is one part of the information carried on the chromosome and it's failry easy to quantify the amount of information present there. However, you could also argue that the way in which the DNA is packaged into chromatin or the patterns of DNA methylation also constitutes information.
 
  • #4
I'm not sure, I thought information was only in the actual DNA.
 
  • #5
Information is also on top of the DNA, like Ygggdrasil mentioned.

I'm not sure why you want to express it as bytes (I have a worm database, which contains all the genomic sequence and specifies locations of genetic elements, which is 1520 MB in size). It is better to formulate it as functional units, such as genes or CNVs or SNPs etc: the database contains 24114 curated coding sequences.
 
  • #6
I want to express it in bytes for the same reason I like to express distance of light-years in kilometers: to give me a reference.
 
  • #7
You don't express length or weight in bytes, do you?
 
  • #8
you might want to express it in bytes if you are looking at it from an http://en.wikipedia.org/wiki/Information_theory" perspective
 
Last edited by a moderator:
  • #9
Proton Soup said:
you might want to express it in bytes if you are looking at it from an http://en.wikipedia.org/wiki/Information_theory" perspective

The raw number of base pairs is still not necessarily the best information theory perspective; but that's a quibble.

The simplest answer for the human chromosome 1 is a bit under 62 Mbytes. There are about 247 million base pairs, and each base pair has four possibilities. So if you just take it as a sequence of units, each unit being two bits, you end up with nearly 62 million bytes.

Cheers -- sylas
 
Last edited by a moderator:
  • #10
I see. Thank you, Sylas.

Also, to Monique, I would, of course, not express a length or weight with a unit of information.
 
  • #11
You are being smart, but you've still not explained why you would want to know the number of bytes over simply the number of units (kilometers/basepairs/genes). You yourself stated that you questioned whether your question has any meaning.
 
  • #12
I did indeed, because I wasn't sure if DNA could be said to carry data in the sense of bytes.

The bytes are to give me a reference point. Can a chromosome carry more data than, say, an advanced computer?

At near 62Mb, no way.
 
  • #13
Alright, fair enough. Just note that it is not a valid comparison. You are comparing an arbitrary unit (one chromosome) to an advanced computer (carrying how many chips?). People are not turning to DNA as the future for computing for nothing: you can contain a lot of information on a very small scale.

A typical diploid human cell has "1.5 Gb of data" stored in a nucleus with a diameter of approximately 6 micrometers (only taking into account the basepairing of the DNA).

According to Leonard Adleman, a pioneer in DNA computing:
What about the future? It is clear that molecular computers have many attractive properties. They provide extremely dense information storage. For example, one gram of DNA, which when dry would occupy a volume of approximately one cubic centimeter, can store as much information as approximately one trillion CDs. They provide enormous parallelism. Even in the tiny experiment carried out in one fiftieth of a teaspoon of solution, approximately 1014 DNA flight numbers were simultaneously concatenated in about one second. It is not clear whether the fastest supercomputer available today could accomplish such a task so quickly.
http://www.cs.virginia.edu/~robins/Computing_with_DNA.pdf
Also note that the biological information in DNA is contained on many layers (for instance epigenetic information): it is not simply a linear sequence of letters.
 
Last edited:
  • #14
i believe the benefit to DNA computing would be the massive amount of parallelization. this can be good for certain types of problems, like cracking encryption by brute strength. but for general purpose computing, where serial calculations are required, it's probably not very useful.
 
  • #15
Monique said:
A typical human cell has "750 Mb of data" stored in a nucleus with a diameter of approximately 6 micrometers.

Dear Monique, a nucleus has far, far more data stored than 750 MB. The DNA backbone alone (not considering DNA methylation) has about 3.3 GB in its haploid form for human DNA (one byte being 2 bits of information A=T, G=C and T=A, C=G). Most of the information is stored in non-coding stretches (98%) which constitute the so-called regulatory network of information (the higher an organism, the more non-coding DNA stretches it carries in the genome). If you consider RNA and proteins in the 3D space of a cell, which is usually highly structured, the stored information of a single cell (or nucleus) is easily in the lower TB. All life forms are self-encoded, self-assembling, self-propagating nano-structures and what is written in the DNA is only to get from one existing nano-structure (fertilized egg) to a complete mature organism and back to the initial nano-structure. If you mess up a fertilized egg's 3D RNA and protein structure (virtually destroy its information content), the micro-injected DNA cannot perform. The DNA is kind of like the software in a robot (fertilized cell) to give the instruction on how to change itself for duplication. The fertilized egg with all its components is the hardware. So not everything is encoded in the DNA. That's why organisms (species) and its DNA (RNA) co-evolved over millions of generations.
 
  • #16
Hi bioinfinity, the human genome carries about 3 billion base pairs, you need two bits to describe each basepair, divide by 8 = 750 Mb. Indeed that is haploid (one copy), a diploid genome (two copies) would be 1.5 Gb.

I've already mentioned that there is additional information besides the bare genetic code.
 

1. What is a chromosome and what does it contain?

A chromosome is a thread-like structure found in the nucleus of a cell that carries genetic information in the form of DNA. It contains all the information necessary for an organism to develop and function.

2. How many bytes of data does a chromosome hold?

The number of bytes of data held by a chromosome varies depending on the organism. For example, in humans, a single chromosome can hold approximately 150 million base pairs, which translates to about 750 million bytes of data.

3. How is the amount of data in a chromosome determined?

The amount of data in a chromosome is determined by its length and the number of base pairs it contains. Base pairs are the building blocks of DNA, and each pair can be represented by two bits of data, making the total data in a chromosome in bytes equal to twice the number of base pairs.

4. Can the amount of data in a chromosome change?

Yes, the amount of data in a chromosome can change through mutations, which are changes in the DNA sequence. This can result in different numbers of base pairs and therefore, different amounts of data in a chromosome.

5. Why is it important to know the amount of data in a chromosome?

Understanding the amount of data in a chromosome is important for studying and understanding genetics, as it can help scientists identify specific genes and their functions. It can also provide insight into genetic disorders and diseases, as well as aid in genetic engineering and gene therapy research.

Similar threads

  • Biology and Medical
Replies
1
Views
2K
  • Biology and Medical
Replies
5
Views
3K
Replies
10
Views
3K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
8
Views
1K
  • Programming and Computer Science
Replies
1
Views
1K
  • Biology and Medical
Replies
1
Views
2K
  • Biology and Medical
Replies
5
Views
4K
  • Biology and Medical
Replies
2
Views
1K
Replies
4
Views
2K
Back
Top