Discussing Information Theory with non-scientists

Anko · Apr 4, 2022

Do you have an opinion about my summary above?

Do you understand the relation between irreversible logic and irreversible process?
According to Landauer, logical irreversibility implies physical irreversibility. This is still a topic of debate it seems to me. Is the debate also about what logic means or what physical means?

Or dare I ask what meaning means?

berkeman · Apr 4, 2022

Welcome to PF.

It would be best if you could post some links to peer-reviewed journal articles that have spawned your questions. That will help others to understand your level of understanding of the subjects, and provide a foundation for the discussion in this thread. Thank you.

Anko · Apr 4, 2022

Abstract
In 1961, Rolf Landauer argued that the erasure of information is a dissipative process1. A minimal quantity of heat, proportional to the thermal energy and called the Landauer bound, is necessarily produced when a classical bit of information is deleted. A direct consequence of this logically irreversible transformation is that the entropy of the environment increases by a finite amount. Despite its fundamental importance for information theory and computer science2,3,4,5, the erasure principle has not been verified experimentally so far, the main obstacle being the difficulty of doing single-particle experiments in the low-dissipation regime.
https://www.nature.com/articles/nature10872?ref=dtf.ru

Anko · Apr 4, 2022

Ok well. My next question is, is there a connection between information and physics?

Or is information purely logical, although a physical pattern isn't necessarily. That is, when we decide a pattern 'has information in it' are we only applying a certain logic?
Conversely are all physical patterns informational, even the ones we aren't interested in?

Lastly, I encountered the opinion in another forum that ideas are not physical. I pointed out that if true, theories in human minds or brains aren't physical. How does a theory get published in physical form?
At the same forum I was informed that information must have meaning.
Lewis Carroll should have known better I suppose.

I'm fairly sure both those ideas are just wrong. That forum is a place I've decided to avoid.

Anko · Apr 4, 2022

I've read this morning anorlundas post and it certainly looks like it touches on some of my questions. I found the style enjoyable and I like the way it implies there are still a few unanswered questions.

I'll go out on a limb here, and suggest that the unanswered questions about the nature of information, of the kind that Seth Lloyd and Leonard Susskind lecture about, are one of the reasons we don't understand the nature of dark energy.

Jarvis323 · Apr 4, 2022

You have to realize there is no universal definition of information that captures all of our intuition about what we think the word means. There is Shannon entropy. Well, that's a measure of a probability distribution function, which happens to have a lot of uses as a tool for specific applications. You've also got Kolmogorov complexity, and other measures from algorithmic information theory. These are actually just a few examples of a broader class of measures, which we call measures of complexity. Each complexity measure can potentially tell us something about something, or help us think about something. But they are all limited and only shed some light on something much deeper.

Shannon information in particular is considered problematic as a standalone measure of what we think information is. The main point people make is that pure randomness gives the most information. Why should that be? How much meaning is there in a perfect coin flip? So people try to come up with ways to measure not just information, but the usefulness of it, or the other aspects of it which deal with structure and other facets of the underlying objects in question. But as far as I'm concerned, all have failed so far to end the debate or even come close, and really, it is quite an open ended quest.

Anko · Apr 5, 2022

Jarvis323 said:

You have to realize there is no universal definition of information that captures all of our intuition about what we think the word means.

Yes, there it is in the last part of your sentence. What the word information means is already a problem. Information in whatever form, does not 'contain' meaning. The meaning is what we decide.

For example the CMB is microwave radiation. Where it came from and why it's everywhere isn't information contained in the radiation itself.

Jarvis323 said:

people try to come up with ways to measure not just information, but the usefulness of it, or the other aspects of it which deal with structure and other facets of the underlying objects in question.

And there it is again. Information by itself doesn't contain any of those other things. We posit the questions, and we decide on usefulness. That principle explains I think why digital computers exist, at least partly.

Thanks for your comments.
P.s. this post was made thanks to some physical thinking.

Anko · Apr 5, 2022

An apology might be in order about the title i used for this thread.

However I've read some of the replies to anorlundas post, and I think I can see a bit of smoke although no fire.

My explanation of the difference between information and knowledge, such as an observer might get "from" a system is by an example, sort of a thought experiment.

Suppose I send a Shannon message to a receiver, and to verify the message has been received I make a phone call to someone with access to the message in a memory.

I receive confirmation, so I have the knowledge the message was transmitted.
I don't know though if it was altered during transmission, or if the received message was 'conserved'.

Suppose further that this message is a binary string which looks completely random. So I and the receiver both know it's incompressible.

What does it say though? That depends, right?
Say I know what it says because I know how it was encoded. Then the receiver needs that knowledge too. Maybe it's encoded words in some language. Maybe it's a string of numbers, or a list of instructions for a particular computer.

So no. Information and knowledge in this sender/receiver context are not the same.

Anko · Apr 5, 2022

Anko said:

An apology might be in order about the title i used for this thread.

However I've read some of the replies to anorlundas post, and I think I can see a bit of smoke although no fire.

My explanation of the difference between information and knowledge, such as an observer might get "from" a system is by an example, sort of a thought experiment.

Suppose I send a Shannon message to a receiver, and to verify the message has been received I make a phone call to someone with access to the message in a memory.

I receive confirmation, so I have the knowledge the message was transmitted.
I don't know though if it was altered during transmission, or if the received message was 'conserved'.

Suppose further that this message is a binary string which looks completely random. So I and the receiver both know it's incompressible.

What does it say though? That depends, right?
Say I know what it says because I know how it was encoded. Then the receiver needs that knowledge too. Maybe it's encoded words in some language. Maybe it's a string of numbers, or a list of instructions for a particular computer.

So no. Information and knowledge in this sender/receiver context are not the same.

I

Jarvis323 said:

Shannon information in particular is considered problematic as a standalone measure of what we think information is. The main point people make is that pure randomness gives the most information. Why should that be?

The why is I think as follows.
Assume Alice has access to a sender and a reliable transmission channel.

She also knows there's a receiver which will reliably store the message she wants to send, but otherwise no knowledge it will be seen by Bob. So Alice knows she has to use an alternative method of communication to relay some clues about the message she hasn't sent yet.

Suppose Bob knows he has a receiver with a memory. He looks at this memory and sees it has a random number in it, so he resets the memory placing it and the receiver device into a no-message state.

If Alice now sends another random number, if Bob reads the memory again he now has information he interprets as a message from Alice. If he knows who Alice is and if Alice and Bob have prearranged a transmission protocol.

Does Bob need to know what the message is?
If he does he will need to apply an algorithm to the Shannon information content.

Sorry about the double post. A snafu I can't correct

Jarvis323 · Apr 5, 2022

Anko said:

The why is I think as follows.
Assume Alice has access to a sender and a reliable transmission channel.

She also knows there's a receiver which will reliably store the message she wants to send, but otherwise no knowledge it will be seen by Bob. So Alice knows she has to use an alternative method of communication to relay some clues about the message she hasn't sent yet.

Suppose Bob knows he has a receiver with a memory. He looks at this memory and sees it has a random number in it, so he resets the memory placing it and the receiver device into a no-message state.

If Alice now sends another random number, if Bob reads the memory again he now has information he interprets as a message from Alice. If he knows who Alice is and if Alice and Bob have prearranged a transmission protocol.

Does Bob need to know what the message is?
If he does he will need to apply an algorithm to the Shannon information content.

If Alice sends Bob truly random numbers, then she's going to have to just send them over in their raw uncompressed form, because there is no way to compress them. That's because the generator never uses any predictable patterns that can lend to guessing one digit based on the previous one. At the other extreme, if Alice always sends the same thing, e.g. just blocks of 1's of different lengths separated by single 0's, then she really doesn't need to send all of those bits. For example, instead of ({1}^256)0, {1}^256 means 1 repeated 256 times, she can just send 10000000, which is 256 in binary. That's a simple example. So the amount of information, is, using the best possible encoding scheme, how many bits per character on average you must send at a minimum for Bob to decode Alice's messages.

Note, this isn't a property of an individual message, it's really a property of the language, and they are allowed to share some finite encoding scheme that doesn't count when calculating the information amount. For example, the language could be over an alphabet of just 3 characters, {a,b,c}, and contain only two words, both would be incompressible words/strings (in an unrestricted language) of length 10000, except that the first is [ b, ..., a ] with no c's, and the second is [ c, ..., a ] with no b's. Well, then all Alice has to do is send either b (which she can encode as 0, or c which she can encode as 1. Then you have 1 bit of information needed to encode each word, which is 1/10000 bits needed per character.

Anko · Apr 5, 2022

I've left out a few things.
I was told in a course that Shannon information entropy is related to the expectation of receiving any message. This applies to messages in ordinary language. These aren't generally random strings but the principle still holds that an unexpected message has more Shannon entropy than an expected one.

A message with a low probability of being received, that is.

Anko · Apr 5, 2022

Jarvis323 said:

If Alice sends Bob truly random numbers, then she's going to have to just send them over in their raw uncompressed form, because there is no way to do lossless compress with them. That's because the generator never uses any predictable patterns that can lend to guessing one digit based on the previous one.

Yep, that's all correct. It doesn't alter expectation though. Or what Alice or Bob might know about a protocol.

Jarvis323 · Apr 5, 2022

Anko said:

I've left out a few things.
I was told in a course that Shannon information entropy is related to the expectation of receiving any message. This applies to messages in ordinary language. These aren't generally random strings but the principle still holds that an unexpected message has more Shannon entropy than an expected one.

A message with a low probability of being received, that is.

Some messages will require that you observe more letters to determine the message than others. And the reason for that would be that there are other words that share the same prefixes that need to be ruled out.

Mathematically, though a process model is used based on Markov models, which considers you are always in some state given what you have observed thus far, and there are transition probabilities for going to some new state from an existing one, and when you take that transition, you output a new character associated with the transition. There is an entropy associated with each transition, based on the possibilities and their probabilities. The entropy rate is computed by averaging the entropy of each transition, which is basically the average uncertainty in which character will come next given the state you're in. Something like that.

Anko · Apr 5, 2022

Jarvis323 said:

There is an entropy associated with each transition, based on how predictable the transition is that it will take and how many possibilities there are. The entropy rate is computed by averaging the entropy of each transition, which is basically the average uncertainty in which character will come next given the state you're in. Something like that.

Sure. But what I see is an algorithmic method there; the receiver only receives and Bob reads what's in its memory. Beyond that is applying algorithms to analyse what's in a memory register. Bob doesn't do that, say. But notice, your algorithm fails with a truly random string.

Jarvis323 · Apr 5, 2022

Anko said:

Sure. But what I see is an algorithmic method there; the receiver only receives and Bob reads what's in its memory. Beyond that is applying algorithms to analyse what's in a memory register. Bob doesn't do that, say.

Bob just decodes the message, by waiting until he sees complete encoded blocks of letters come in that he can be sure he can translate uniquely into decoded blocks of letters. And once he's seen the whole message, it needs to be such that there is only one unique way he can break it up into blocks, where each block is a code word.

Ultimately it comes down to finding encodings in which the code words are as small as possible, and that any valid sequence of code words can be uniquely broken down into code words.

For example, { 00, 001, 1} doesn't work as a set of code words, because if you see 001, it can be either 00 followed by 1, or just 001.

In a low information case, the language is such that small blocks decode into long ones.

Anyway, any meaning or knowledge in the message depends on what the language means to Alice and Bob. They could be sending random numbers, a process requiring lots of bits (information) per character, or they could be sending predicable words, like blocks of 1's, requiring less bits. In both cases, the words they're sending could be meaningless aside from that they are valid words according to their encoding scheme.

Jarvis323 · Apr 5, 2022

Anko said:

But notice, your algorithm fails with a truly random string.

I should point out that the Markov model is not the decoder. It's a probabilistic model which generates strings that look statistically identical to the kind that the source would generate. It's used for applying information theory to analyze the process generating the sequences. The source could be observations of a physical process converted into sequences of discrete symbols, for example.

And I should also point out that when choosing the code words, it is not as though you need to be able break up the decoded message into meaningful words. For example, if the original message is 001, then that should be the output. And if { 00, 1, and 001 } each mean something to Alice and Bob, then Alice just chose a bad message to send. The coding scheme just considers the process of being able to take any given message and break it up into blocks that can be represented by a set of other smaller blocks.

The decoder is not concerned with interpreting the message, only recovering it. If the source is just sending random numbers, then the entire message is one block every time, and it decodes into itself.

Anko · Apr 6, 2022

In the course I took we started with messages in English. Shannon entropy applies because in the theory it isn't so much what the message is than what the expectation of receiving a particular message is.

Alice might have a killer encoding algorithm that encrypts any message so it looks random, it might be partly compressible but that fact about the Shannon entropy is not useful.

Alice increases the entropy of the message, there is more information to decrypt and unless Bob the receiver has a particular decryption algorithm the meaning stays hidden. This is what say the RSA algorithm does. Or attempts to do.

Jarvis323 · Apr 6, 2022

Anko said:

In the course I took we started with messages in English. Shannon entropy applies because in the theory it isn't so much what the message is than what the expectation of receiving a particular message is.

Alice might have a killer encoding algorithm that encrypts any message so it looks random, it might be partly compressible but that fact about the Shannon entropy is not useful.

Alice increases the entropy of the message, there is more information to decrypt and unless Bob the receiver has a particular decryption algorithm the meaning stays hidden. This is what say the RSA algorithm does. Or attempts to do.

That makes sense. But I wouldn't say the particular message has more information to decrypt. It's not a property of a particular message. It's about the space of possible messages, and the probabilities for the different possibilities. Alice's killer encryption algorithm can generate a message which is just a block of all 1's for some message, and that's fine, but a slightly different message ought to not look similar to a block of 1's when encrypted.

A typical encrypted message will look random though, meaning you will not expect to see many patterns in it.

Anko · Apr 6, 2022

So the skinny version is, Shannon entropy aka information content of messages, can be meaningful--it (the physical 'string', interpreting the entropy doesn't scan) can be successfully interpreted--but that isn't a rule, it can also be completely non interpretable. It might be a regular expression in a formal language, the receiver a finite state automaton.

Shannon realized how this is useful to know.
Then what does information entropy mean, if the message doesn't have to mean anything? I asked myself that question at some point during the IT course.

Jarvis323 · Apr 6, 2022

Anko said:

So the skinny version is, Shannon entropy aka information content of messages, can be meaningful--it (the physical 'string', interpreting the entropy doesn't scan) can be successfully interpreted--but that isn't a rule, it can also be completely non interpretable. It might be a regular expression in a formal language, the receiver a finite state automaton.

Shannon realized how this is useful to know.
Then what does information entropy mean, if the message doesn't have to mean anything? I asked myself that question at some point during the IT course.

You might find Bennett's logical depth idea interesting, even though it pretty much seems like a dead end.

We propose depth as a formal measure of value. From the earliest days
of information theory it has been appreciated that information per se is not
a good measure of message value. For example, a typical sequence of coin
tosses has high information content but little value; an ephemeris, giving the
positions of the moon and planets every day for a hundred years, has no more
information than the equations of motion and initial conditions from which it
was calculated, but saves its owner the effort of recalculating these positions.
The value of a message thus appears to reside not in its information (its
absolutely unpredictable parts), nor in its obvious redundancy (verbatim
repetitions, unequal digit frequencies), but rather in what might be called
its buried redundancy—parts predictable only with difficulty, things the
receiver could in principle have figured out without being told, but only at
considerable cost in money, time, or computation. In other words, the value
of a message is the amount of mathematical or other work plausibly done
by its originator, which its receiver is saved from having to repeat.

Of course, the receiver of a message does not know exactly how it orig-
inated; it might even have been produced by coin tossing. However, the
receiver of an obviously non-random message, such as the first million bits
of π, would reject this “null” hypothesis, on the grounds that it entails
nearly a million bits worth of ad-hoc assumptions, and would favor an al-
ternative hypothesis that the message originated from some mechanism for
computing pi. The plausible work involved in creating a message, then, is
the amount of work required to derive it from a hypothetical cause involving
no unnecessary, ad-hoc assumptions. It is this notion of the message value
that depth attempts to formalize.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.70.4331&rep=rep1&type=pdf

Anko · Apr 6, 2022

More comments.

What information means is down to choices humans make. The universe around us contains meaningful information for the (apparently) simple reason that we choose what a reset condition is.

I snuck in Landauer's principle with the Alice/Bob scenario above when Bob sees a random message and clears the memory register. It might be that the register now contains a string of zeros. It might be that it contains a string of ones. It might be any string though, Bob gets to choose.

Together Alice and Bob choose a protocol which necessarily fixes a few things. What gets fixed then determines certain meaning, for the information.

As we do in forums or social media. We choose to send and receive Shannon messages in an agreed protocol with a chosen encoding.

Jarvis323 · Apr 6, 2022

Anko said:

More comments.

What information means is down to choices humans make. The universe around us contains meaningful information for the (apparently) simple reason that we choose what a reset condition is.

I snuck in Landauer's principle with the Alice/Bob scenario above when Bob sees a random message and clears the memory register. It might be that the register now contains a string of zeros. It might be that it contains a string of ones. It might be any string though, Bob gets to choose.

Together Alice and Bob choose a protocol which necessarily fixes a few things. What gets fixed then determines certain meaning, for the information.

As we do in forums or social media. We choose to send and receive Shannon messages in an agreed protocol with a chosen encoding.

Pretty much yeah.

To think about the meaning of information you can also look at conditional information, or mutual information.

The way I like to think about it, I would say that the messages have meaning to Bob and Alice because there is mutual information between Bob and Alice's mental models, and then there is mutual information between Bob and Alice's mental models and the language.

Basically, it means that the information in one signal gives you predictive power to guess the output of another. Then you have a universe, with all of its parts and the correlations and causation between them, and so between any pair of signals (or subsets of signals) generated by them, the information takes on some basic intrinsic meaning upon statistical analysis. Our minds then interpret that with our mental models, which its able to do because we have some mutual information between the abstractions in our mental models and the things we're measuring. After that, you get into philosophy territory.

If you relax how strict you are about the formality of Shannon information, you can apply the concepts to talk about things besides 1D observable discrete signals from stationary random processes (which is the domain of information theory). E.g., we could pretend that information theory applies to systems of mathematics or logic, and then say that a proof of a theorem has meaning in relation to the theorem, or in relation to the whole system of mathematics. Then you can talk about say number theory, or math problems, or the physical object itself.

To formalize those things though, you need something else, like algorithmic information theory. Kolmogorov complexity is basically the algorithmic analog to Shannon entropy. In that, a binary string over an alphabet, in and of itself, has an intrinsic complexity, which is the size of the minimal computer program that prints it. It can apply to stand alone objects/strings, because its not a statistical measure like Shannon entropy, and also because the computer program doesn't need to be the same one that prints other objects, it only needs to print the one object. Then you can maybe begin looking at conditional Kolmogorov complexity as a formal notion of meaning between discrete objects, or in a shared environment.

Ultimately, everything interesting is relational I think.

Anko · Apr 7, 2022

Jarvis323 said:

Ultimately, everything interesting is relational I think.

Yep, that sounds congruent with what I've been trying to understand. I did some graph theory and poset graphs to me were another type of information.

This is a context, I think it's usually called.
I've looked at how in a Rubik's cube information is lost. If Alice sends Bob a randomly permuted one of these he might apply some algorithm and see a pattern that means something. Solving a "scrambled" puzzle also abstractly resets it.

It's just an example of complexity or a register with three dimensions. I know it's a polytope, and in fact several other kinds of algebraic object.

Anko · Apr 8, 2022

Another thing I've realized about information entropy in Rubik's puzzle:

If you 'receive' a cube which is scrambled you see a random pattern. The information that would tell you how it was permuted isn't accessable. You can achieve this particular 'lost information' state by applying permutations without looking at the puzzle or memorizing the sequence.

So now you solve the puzzle by trying to find the inverse permutation, and this means you need to store other information by deciding how much of it is in a solved state.

You reset the state to the one generally accepted as the solution. But again that's a choice. If you characterize your solution as a memory reset, then if you also write down all the 'moves' it took, is there an erasure?

If instead you have a computer program that models the puzzle, and it has a reset function that doesn't compute individual moves but instead overwrites it with the (accepted) solution, then again information is lost.
So Landauer's limit must hold.

Or what?

Discussing Information Theory with non-scientists

1. What is information theory?

2. Why is information theory important?

3. How does information theory relate to everyday life?

4. Can you give an example of information theory in action?

5. How can I learn more about information theory?

Similar threads

Hot Threads

Recent Insights