Is information theory useful in biology?

BWV · Dec 1, 2020

I only see it brought up in creationist attacks on evolution, definitely NOT trying to bring that up - curious if and how real biological science uses it. There are a couple of (expensive) older books and paywalled papers that seem legit, but cannot find much else

for example
https://www.amazon.com/dp/9814401234/?tag=pfamazon01-20or

https://www.cell.com/trends/ecology...m/retrieve/pii/S0169534717302550?showall=true

jedishrfu · Dec 1, 2020

Here's a 2015 Quanta magazine article on it:

https://www.quantamagazine.org/the-information-theory-of-life-20151119/

and Adami's related paper:

https://arxiv.org/ftp/arxiv/papers/1112/1112.3867.pdf

and this one from NCBI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

I wasn't able to find any other uses other than using information theory to understand how life evolved.

fresh_42 · Dec 1, 2020

Here is what is written on Wikipedia:

In addition to mathematics, computer science and telecommunications, the theoretical consideration of communication through information theory is also used to describe communication systems in other areas (e.g. media in journalism, the nervous system in neurology, DNA and protein sequences in molecular biology, knowledge in information science and documentation).

I guess the answer is a clear Yes.

jim mcnamara · Dec 1, 2020

Yup, it is a clear yes. Ex: Species diversity and genetic diversity calculations use methods that date back to Claude Shannon.

Jarvis323 · Dec 1, 2020

You should read The Bandwagon.

https://monoskop.org/images/2/2f/Shannon_Claude_E_1956_The_Bandwagon.pdf

There has been a lot of misuses of information theory over the years in various fields. You just have to watch out about that.

sysprog · Dec 1, 2020

Excerpt from https://en.wikipedia.org/wiki/Genetic_code:

Information channels: Information-theoretic approaches model the process of translating the genetic code into corresponding amino acids as an error-prone information channel.[87] The inherent noise (that is, the error) in the channel poses the organism with a fundamental question: how can a genetic code be constructed to withstand noise[88] while accurately and efficiently translating information? These "rate-distortion" models[89] suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino acids,[90] for error-tolerance[85] and for minimal resource cost. The code emerges at a transition when the mapping of codons to amino acids becomes nonrandom. The code's emergence is governed by the topology defined by the probable errors and is related to the map coloring problem.[91]

fresh_42 · Dec 1, 2020

It is quite easy: All parts of stochastics are useful in any natural science. This includes information theory and biology.

BWV · Dec 2, 2020

jedishrfu said:

Here's a 2015 Quanta magazine article on it:

https://www.quantamagazine.org/the-information-theory-of-life-20151119/

and Adami's related paper:

https://arxiv.org/ftp/arxiv/papers/1112/1112.3867.pdf

and this one from NCBI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

I wasn't able to find any other uses other than using information theory to understand how life evolved.

From Adami:

we should think of the human genome — or the genome of any organism — as a repository of information about the world gathered in small bits over time through the process of evolution. The repository includes information on everything we could possibly need to know, such as how to convert sugar into energy, how to evade a predator on the savannah, and, most critically for evolution, how to reproduce or self-replicate.

Curious if, say, a spider that has its behavior 'hard-coded' rather than learned then has more information than a human infant under this definition, or is it more appropriate to think in terms of the species in aggregate?

BWV · Dec 2, 2020

sysprog said:

Excerpt from https://en.wikipedia.org/wiki/Genetic_code:

Information channels: Information-theoretic approaches model the process of translating the genetic code into corresponding amino acids as an error-prone information channel.[87] The inherent noise (that is, the error) in the channel poses the organism with a fundamental question: how can a genetic code be constructed to withstand noise[88] while accurately and efficiently translating information? These "rate-distortion" models[89] suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino acids,[90] for error-tolerance[85] and for minimal resource cost. The code emerges at a transition when the mapping of codons to amino acids becomes nonrandom. The code's emergence is governed by the topology defined by the probable errors and is related to the map coloring problem.[91]

Right, so there should be some equivalent of a compression algorithm to eliminate noise and redundant information in a genome?

sysprog · Dec 2, 2020

BWV said:

Right, so there should be some equivalent of a compression algorithm to eliminate noise and redundant information in a genome?

Some of the noise and redundancy may have important function:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5065233/

There's a lot of work being done regarding genomic compression:
https://spectrum.ieee.org/computing/software/the-desperate-quest-for-genomic-compression-algorithms

Here's an example:

PDF direct download link: https://www.mdpi.com/1999-4893/13/4/99/pdf

A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme

Deloula Mansouri , Xiaohui Yuan * and Abdeldjalil Saidani

School of Computer Science and Technology,

Wuhan University of Technology, Wuhan 430070, China

Received: 24 March 2020; Accepted: 17 April 2020; Published: 20 April 2020

Abstract:

With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge for bioinformatics. Therefore, to overcome these challenges, compression has become necessary. In this paper, we describe a new reference-free DNA compressor abbreviated as DNAC-SBE. DNAC-SBE is a lossless hybrid compressor that consists of three phases. First, starting from the largest base (Bi), the positions of each Bi are replaced with ones and the positions of other bases that have smaller frequencies than Bi are replaced with zeros. Second, to encode the generated streams, we propose a new single-block encoding scheme (SEB) based on the exploitation of the position of neighboring bits within the block using two different techniques. Finally, the proposed algorithm dynamically assigns the shorter length code to each block. Results show that DNAC-SBE outperforms state-of-the-art compressors and proves its efficiency in terms of special conditions imposed on compressed data, storage space and data transfer rate regardless of the file format or the size of the data.

Jarvis323 · Dec 2, 2020

jedishrfu said:

Here's a 2015 Quanta magazine article on it:

https://www.quantamagazine.org/the-information-theory-of-life-20151119/

and Adami's related paper:

https://arxiv.org/ftp/arxiv/papers/1112/1112.3867.pdf

and this one from NCBI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

I wasn't able to find any other uses other than using information theory to understand how life evolved.

I read the first paper. One thing to note is that one of the main examples in the paper, Integrated Information (as a theory for consciousness), has taken some heavy criticism, or some would argue has been debunked, namely by Scott Aaronson who concluded,

But let me end on a positive note. In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed. Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.

https://www.scottaaronson.com/blog/?p=1799

With these kind of very high level theories that aim to find the key to a mystery, you have to go back to Shannon's warning about the bandwagon. Using the word information in science is exciting. But what information is in information theory is usually not what people think information is intuitively. As it is easy to conflate ones idea of information with information from information theory, it makes it fairly easy to write papers that sound both quantitative and deep/promising. Like Aaronson's description of most theories of consciousness, many of them are "so vague, fluffy, and malleable that they can only aspire to wrongness."

Pythagorean · Dec 6, 2020

base pairs are essentially a base-4 system, how could information theory not be relevant? There's also a lot of overlap with thermodynamics and Gibb's Free Energy

sysprog · Dec 16, 2020

I found this to be intriguing: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology:

Is information theory useful in biology?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Can Dogs Talk Using Buttons?

Incredible Difference in Ant Sizes

What causes the asymmetry in a symmetrically developing organism?

A New Niche for Life at Low G

There are people in biology who really do math

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect