Is information theory useful in biology?

Click For Summary

Discussion Overview

The discussion explores the application of information theory in biology, particularly its relevance to evolutionary processes, genetic coding, and the understanding of biological systems. Participants share various resources and perspectives on how information theory intersects with biological concepts, including species diversity and genomic compression.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • Some participants note that information theory is often referenced in discussions about evolution, particularly in creationist critiques, but seek legitimate scientific applications.
  • Others provide links to articles and papers that discuss the role of information theory in understanding the evolution of life and genetic coding.
  • One participant mentions that species and genetic diversity calculations utilize methods from Claude Shannon's work in information theory.
  • There are claims that information-theoretic approaches model the genetic code as an error-prone information channel, raising questions about noise and redundancy in genetic information.
  • Some participants discuss the potential for genomic compression algorithms to manage large amounts of genomic data, suggesting that noise and redundancy may have functional importance.
  • A later reply raises concerns about the misuse of information theory in various fields, indicating a need for caution in its application.
  • One participant references Integrated Information Theory in the context of consciousness, noting that it has faced significant criticism.

Areas of Agreement / Disagreement

Participants express a range of views on the utility of information theory in biology, with some asserting its clear relevance while others highlight potential misapplications. The discussion does not reach a consensus on the extent or nature of its usefulness.

Contextual Notes

Some claims rely on specific definitions and assumptions about information theory and its applications in biology, which may not be universally accepted. The discussion includes references to various papers and articles that may have differing interpretations or conclusions.

BWV
Messages
1,667
Reaction score
2,012
  • Like
Likes   Reactions: BillTre
Biology news on Phys.org
  • Like
Likes   Reactions: BWV and BillTre
Here is what is written on Wikipedia:
In addition to mathematics, computer science and telecommunications, the theoretical consideration of communication through information theory is also used to describe communication systems in other areas (e.g. media in journalism, the nervous system in neurology, DNA and protein sequences in molecular biology, knowledge in information science and documentation).

I guess the answer is a clear Yes.
 
  • Like
Likes   Reactions: BillTre
Yup, it is a clear yes. Ex: Species diversity and genetic diversity calculations use methods that date back to Claude Shannon.
 
  • Like
Likes   Reactions: BillTre
Excerpt from https://en.wikipedia.org/wiki/Genetic_code:
  • Information channels: Information-theoretic approaches model the process of translating the genetic code into corresponding amino acids as an error-prone information channel.[87] The inherent noise (that is, the error) in the channel poses the organism with a fundamental question: how can a genetic code be constructed to withstand noise[88] while accurately and efficiently translating information? These "rate-distortion" models[89] suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino acids,[90] for error-tolerance[85] and for minimal resource cost. The code emerges at a transition when the mapping of codons to amino acids becomes nonrandom. The code's emergence is governed by the topology defined by the probable errors and is related to the map coloring problem.[91]
 
  • Like
Likes   Reactions: BillTre and fresh_42
It is quite easy: All parts of stochastics are useful in any natural science. This includes information theory and biology.
 
  • Like
Likes   Reactions: BillTre and Fervent Freyja
jedishrfu said:
Here's a 2015 Quanta magazine article on it:

https://www.quantamagazine.org/the-information-theory-of-life-20151119/

and Adami's related paper:

https://arxiv.org/ftp/arxiv/papers/1112/1112.3867.pdf

and this one from NCBI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

I wasn't able to find any other uses other than using information theory to understand how life evolved.
From Adami:

we should think of the human genome — or the genome of any organism — as a repository of information about the world gathered in small bits over time through the process of evolution. The repository includes information on everything we could possibly need to know, such as how to convert sugar into energy, how to evade a predator on the savannah, and, most critically for evolution, how to reproduce or self-replicate.

Curious if, say, a spider that has its behavior 'hard-coded' rather than learned then has more information than a human infant under this definition, or is it more appropriate to think in terms of the species in aggregate?
 
sysprog said:
Excerpt from https://en.wikipedia.org/wiki/Genetic_code:
  • Information channels: Information-theoretic approaches model the process of translating the genetic code into corresponding amino acids as an error-prone information channel.[87] The inherent noise (that is, the error) in the channel poses the organism with a fundamental question: how can a genetic code be constructed to withstand noise[88] while accurately and efficiently translating information? These "rate-distortion" models[89] suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino acids,[90] for error-tolerance[85] and for minimal resource cost. The code emerges at a transition when the mapping of codons to amino acids becomes nonrandom. The code's emergence is governed by the topology defined by the probable errors and is related to the map coloring problem.[91]
Right, so there should be some equivalent of a compression algorithm to eliminate noise and redundant information in a genome?
 
  • #10
BWV said:
Right, so there should be some equivalent of a compression algorithm to eliminate noise and redundant information in a genome?
Some of the noise and redundancy may have important function:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5065233/

There's a lot of work being done regarding genomic compression:
https://spectrum.ieee.org/computing/software/the-desperate-quest-for-genomic-compression-algorithms

Here's an example:

PDF direct download link: https://www.mdpi.com/1999-4893/13/4/99/pdf

A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme​
Deloula Mansouri , Xiaohui Yuan * and Abdeldjalil Saidani​
School of Computer Science and Technology,​
Wuhan University of Technology, Wuhan 430070, China​
Received: 24 March 2020; Accepted: 17 April 2020; Published: 20 April 2020​
Abstract:​
With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge for bioinformatics. Therefore, to overcome these challenges, compression has become necessary. In this paper, we describe a new reference-free DNA compressor abbreviated as DNAC-SBE. DNAC-SBE is a lossless hybrid compressor that consists of three phases. First, starting from the largest base (Bi), the positions of each Bi are replaced with ones and the positions of other bases that have smaller frequencies than Bi are replaced with zeros. Second, to encode the generated streams, we propose a new single-block encoding scheme (SEB) based on the exploitation of the position of neighboring bits within the block using two different techniques. Finally, the proposed algorithm dynamically assigns the shorter length code to each block. Results show that DNAC-SBE outperforms state-of-the-art compressors and proves its efficiency in terms of special conditions imposed on compressed data, storage space and data transfer rate regardless of the file format or the size of the data.​
 
  • Like
Likes   Reactions: BWV
  • #11
jedishrfu said:
Here's a 2015 Quanta magazine article on it:

https://www.quantamagazine.org/the-information-theory-of-life-20151119/

and Adami's related paper:

https://arxiv.org/ftp/arxiv/papers/1112/1112.3867.pdf

and this one from NCBI

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3220916/

I wasn't able to find any other uses other than using information theory to understand how life evolved.
I read the first paper. One thing to note is that one of the main examples in the paper, Integrated Information (as a theory for consciousness), has taken some heavy criticism, or some would argue has been debunked, namely by Scott Aaronson who concluded,

But let me end on a positive note. In my opinion, the fact that Integrated Information Theory is wrong—demonstrably wrong, for reasons that go to its core—puts it in something like the top 2% of all mathematical theories of consciousness ever proposed. Almost all competing theories of consciousness, it seems to me, have been so vague, fluffy, and malleable that they can only aspire to wrongness.

https://www.scottaaronson.com/blog/?p=1799

With these kind of very high level theories that aim to find the key to a mystery, you have to go back to Shannon's warning about the bandwagon. Using the word information in science is exciting. But what information is in information theory is usually not what people think information is intuitively. As it is easy to conflate ones idea of information with information from information theory, it makes it fairly easy to write papers that sound both quantitative and deep/promising. Like Aaronson's description of most theories of consciousness, many of them are "so vague, fluffy, and malleable that they can only aspire to wrongness."
 
Last edited:
  • Like
Likes   Reactions: jedishrfu, atyy and BWV
  • #12
base pairs are essentially a base-4 system, how could information theory not be relevant? There's also a lot of overlap with thermodynamics and Gibb's Free Energy
 
  • Like
Likes   Reactions: jim mcnamara

Similar threads

  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K