Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Question about the encoding of information

  1. May 8, 2017 #1
    What is the difference between how digital information encoded into a hard drive and how information is coded into DNA? It just seems strange to me how you are able to encode information/data into a physical object.
  2. jcsd
  3. May 8, 2017 #2


    User Avatar
    Science Advisor

    Does it seem strange to you to write information on a piece of paper?
  4. May 9, 2017 #3


    User Avatar
    Gold Member

    A closer cousin to DNA, imo, than a hard drive is a lock. Information is encoded in a lock by means of its mechanical construction. Pins and tumblers are in a certain position, and only a specific key will put the pins in the right locations to open the lock. If you can make the conceptual jump that a lock contains information, then you are pretty close to knowing how DNA contains information. The information that is encoded into a lock is the specific key that will open it. Similarly, information is encoded into DNA directly by means of the physical structure of the DNA. Its a tiny machine that causes specific things to happen in other tiny machines via its specific physicality.

    One can describe a hard drive in the same way, that the physical existence of magnetic 1's and 0's causes a sequence of events in the computer that eventually result in user-visible things, but one doesn't usually think of a hard drive as a mechanically encoded device, or at least I don't.
  5. May 13, 2017 #4


    Staff: Mentor

    From my perspective (that is, not as a biologist), there is very little difference between the coding of DNA sequences and sequences of numbers stored on a hard drive.

    In my limited understanding, in DNA there are two intertwined helixes that have billions of bases connecting them. The two connecting points on the two helixes are called base pairs. Between two base pairs are a sequence of nucleotides, of which there are four types: adenine, ctyosine, guanine, and thymine. These nucleotides are commonly abbreviated by the letters A, C, G, and T.

    Here is one base-pair sequence: ATCGATTGAGCTCTAGCG (from the wiki article on base pairs).

    Since there are four symbols used, biologists could have used a base-4 numbering system, using the digits 0, 1, 2, and 3 instead of the letters A,C, G, and T.

    On a hard drive or in the memory of a computer, there are strings of 0s and 1s that can represents letters or numeric data of various types. It's a simple matter to translate from one number system (base-4 or the letters A, C, G, and T) to a binary system (base-2), so the base pairs in DNA could be represented as strings of binary numbers, and likewise, the binary strings in a computer could be represented by strings fo A, C, G, and T symbols. The reason computers don't use these symbols is that it's much easier to make a device that can be in one of two states (on/off or high voltage/low voltage) in comparison to a device that can be in one of four states.

    BTW, back in the 80s or so there was a movie titled "GATTACA" that had something to do with the nucleotides in DNA.

    Exactly. When I write C-A-T, readers who are able to read English know that these symbols represent a feline animal (as one possible meaning).
  6. May 13, 2017 #5


    User Avatar

    Staff: Mentor

    You can encode information in almost any way you want. You can carve it into a tree, paint it on a canvas, mold it into shapes, change the polarity of magnetic domains, and many more. Note that 'information' is actually more nuanced that you might imagine. See this article: https://en.wikipedia.org/wiki/Information

    You're a few years off. It was 1997: http://tvtropes.org/pmwiki/pmwiki.php/Film/Gattaca
    Since you're within an order of magnitude, we'll call it 'good enough'. :biggrin:
  7. May 14, 2017 #6


    User Avatar

    Last edited: May 14, 2017
  8. May 14, 2017 #7
    If I recall correctly DNA uses Excess4 encoding, which used to be used ( :-) ) in encoder wheels for shaft angles. Excess4 gives one bit error correction.
  9. May 14, 2017 #8


    User Avatar

    Staff: Mentor

    If that's a joke, it's over my head. If not, it's still over my head. :rolleyes:
  10. May 15, 2017 #9
    It's an old fashioned (I am old fashioned) name for a Grey code I think. I don't think it actually corrects 1 bit errors but the code only changes by 1 bit at a time. They were sometimes called genetic codes.
Share this great discussion with others via Reddit, Google+, Twitter, or Facebook

Have something to add?
Draft saved Draft deleted