It's becoming clearer now,
wikipedia has an explanation that talks about the predictability of a string of questions, being a unit of per question. Their formulas have the same relationships between the ideas. I guess there isn't much of a relationship between what they stand for as I thought there would be.
The 1st paragraph describes how entropy is measured as unpredictability on average is a function of what is available after whatever number of questions are answered out of the range that is possible given the problem. The second paragraph describes it as the diminishing average of unpredictability over answering more and more of the subsequent questions within the framework of the problem. So, what I'll pull from this is that Shannon entropy is named after scientific entropy due to the form of both ideas and obviously not their individual constituent ideas. Maybe you could describe it in terms of scientific entropy, where the number of micro-states unknown vs the known amount is diminishing within a particular system as you begin to answer what each individual micro-state is given the total possible range of possible values...you know this reminds me of when I learned about degrees of freedom in statistics.
...
wikipedia:
"Now consider the example of a coin toss. Assuming the probability of heads is the same as the probability of tails, then the entropy of the coin toss is as high as it could be. This is because there is no way to predict the outcome of the coin toss ahead of time: if we have to choose, the best we can do is predict that the coin will come up heads, and this prediction will be correct with probability 1/2. Such a coin toss has one bit of entropy since there are two possible outcomes that occur with equal probability, and learning the actual outcome contains one bit of information. In contrast, a coin toss using a coin that has two heads and no tails has zero entropy since the coin will always come up heads, and the outcome can be predicted perfectly. Analogously, one binary-outcome with equiprobable values has a Shannon entropy of {\displaystyle \log _{2}2=1}
bit. Similarly, one
trit with equiprobable values contains {\displaystyle \log _{2}3}
(about 1.58496) bits of information because it can have one of three values.
English text, treated as a string of characters, has fairly low entropy, i.e., is fairly predictable. Even if we do not know exactly what is going to come next, we can be fairly certain that, for example, 'e' will be far more common than 'z', that the combination 'qu' will be much more common than any other combination with a 'q' in it, and that the combination 'th' will be more common than 'z', 'q', or 'qu'. After the first few letters one can often guess the rest of the word. English text has between 0.6 and 1.3 bits of entropy per character of the message."