240 likes | 254 Views
Explore the world of Shannon Entropy, from its origins to its applications in signal compression. Learn how to maximize message efficiency and delve into the fascinating interplay between information theory and thermodynamics. Gain insights into the statistical profile of signal activity and the art of coding messages effectively. Discover the connection between Shannon and Boltzmann entropy in the realm of data science.
E N D
Shannon et alia Tim Maudlin NYU & John Bell Institute The Nature of Entropy I Saig, Germany July 23, 2019
The Open Vista Thermodynamic Entropy Statistical Mechanical Entropy 1) Boltzmann Entropy 2) Gibbs Entropy Von Neumann Entropy Shannon Entropy
The Open Vista Thermodynamic Entropy dSTh = δQ/T Statistical Mechanical Entropy 1) Boltzmann Entropy: SB = k ln W 2) Gibbs Entropy: SG = − kΣpiln(pi) Von Neumann Entropy: SvN= −k Tr(ρlnρ) Shannon Entropy SS = −Σpiln(pi)
Key Questions In those definitions which invoke a probability measure, what is it? What determines the right measure? Since the thermodynamic entropy does not invoke such a measure, why do so many of the others? Why does Boltzmann’s definition look so different?
The Engineers We began with Carnot, who was trying to maximize the amount of thermal energy that could be converted to mechanical energy in a system, and now meet Claude Shannon, who was trying to maximize the information that can be sent across a channel from the transmitter to the receiver. Ironically, while STh being maximized was Carnot’s worst nightmare, Shannon is working as hard as he can to maximize SS. And while systems spontaneously evolve toward the maximum SB, they must be designed to maximize SS
What was Shannon up to? Shannon wanted to figure out how to send plain text messages over communication channels with maximum efficiency. One of his main concerns was dealing with noisy channels, which requires some (inefficient) redundancy in the message, such as bit-checks. These sorts of correction codes are not our concern. We will assume a noiseless channel. Imagine, for a moment, that you want to transmit English plain text over such a channel as efficiently (i.e. with as short messages as possible). What do you do?
Signal Compression How efficiently one can compress a signal depends on the statistical profile of the activity of the transmitter. For example, if there are only 2 possible texts, then you can send the signal with 1 bit. But this might lead to trouble….
Signal Compression: Good Example Suppose you are sending many telegrams in English and want to save on letters. You might adopt this convention. Instead of “qu” you always write “q”. If the word has a “q” not followed by a “u”, then write “qu”. Thus “There are quintillions of quick, quarrelling, quacking ducks in Qatar.” becomes “There are qintillions of qick, qarrelling, qacking ducks in Quatar.” That saves 3 letters.
Code and Decode Shannon wanted to figure out how to code messages (strings of “letters”) to maximize the efficiency. That clearly depends on the source of the messages. The “qu” trick works well in English, but would be a disaster in latinized Arabic. In short, Shannon needed to be able to characterize his message source to do his work. He did so via a probability distribution over the possible messages, derived from empirical statistics.
Morse Code These considerations were applied in the construction of Morse code. In English, the letters that most frequently occur are “e” and “t” (etaoinshrdlu seems to have become etaoinshrdlc!), and the least frequent are “q” and “z” (cf. Scrabble). In Morse code, the code for “e” is dot and for “t” is dash. The code for “q” is dash dash dot dash and for “z” is dash dash dot dot. (But “m” is dash dash while “s” is dash dash dash and “o” dot dot dot.)
The Formula Shannon Entropy SS = − Σpiln(pi) Suppose there are only 2 letters in the alphabet, “1” and “0”. At the simplest level, we take a set of texts and calculate the frequency of “1” (pr(1)) and of “0” (pr(0)). The Shannon entropy relative to this probability measure is − (pr(1)ln(pr(1)) + pr(0)ln(pr(0)) = − (pr(1)ln(pr(1)) +(1 − pr(1))ln(1 − pr(1)))
But…. Looking at just the frequency of individual letters is not sufficient, as the “qu” example shows. If every text were “010101010101010101” we would have the maximum Shannon entropy, so there would be no call to try to compress. But if we chunk the text into 2-bit parts, it would be clear that the Shannon entropy relative to that scale is very low.
Scale For Shannon’s purposes, then, there is an art to deciding on the scale at which the Shannon entropy should be calculated to determine how much Shannon entropy there is in a source. Where the Shannon entropy of a source is low, one seeks a coding scheme so that the Shannon entropy of the coded text is high. One can say that the Shannon entropy (at a scale) measures the randomness or disorder or unpredictability of the source at that scale. The value is determined by a probability measure, and that measure in turn is derived from an empirical frequency.
Connection to Boltzmann We have seen that SB is the odd man out in our menagerie of entropies. Boltzmann Entropy: SB = k ln W That is actually not a formula that Boltzmann ever wrote down or would recognize! Despite this…
The Open Vista Thermodynamic Entropy dSTh = δQ/T Statistical Mechanical Entropy 1) Boltzmann Entropy: SB = k ln W 2) Gibbs Entropy: SG = − kΣpiln(pi) Von Neumann Entropy: SvN= −k Tr(ρlnρ) Shannon Entropy SS = −Σpiln(pi)
Why ln? The natural log appears in the formula for the simple reason that one wants the defined quantity to be extensive, i.e. you want the entropy for the joint system A + B to be SA + SB.
Boltzmann and Entropy Boltzmann is famous for ”proving” something or other about a quantity he represented by the letter H. Boltzmann “proved” that the quantity represented by H would always decrease, or at least not increase, in virtue of the interactions of molecules in a gas. If you add a minus sign, you get a quantity that will always increase, or at least not decrease. This quantity is maximized when the distribution of positions of the molecules is flat, and the distribution of velocities in the Maxwell-Boltzmann distribution. Question: What does the “H” in the H-theorem stand for?
The Strategy Boltzmann (following Maxwell and Lorentz) set out to prove that “Maxwell’s velocity distribution is the only possible one” by showing that 1) given the Stosszahlansatz, collisions will only decrease (or leave unchanged) the value of H, and 2) the minimal value of H is given by the Maxwell distribution. So the negative of the value of H will never decrease (under the assumptions) and the velocity distribution will approach that of Maxwell.
Maxwell and Shannon Given the formal similarities in the mathematical definitions of SS and H, what deep conceptual or physical relation is there between the Shannon entropy and the thermodynamic entropy or the Boltzmann entropy or the Gibbs entropy? As far as I can tell, absolutely none. Zero. Nada. Zilch. One can ask after the Boltzmann entropy of a particular box of gas at a particular time. To ask after its Shannon entropy is literally complete nonsense. Similarly, the Shannon entropy of a source tells you exactly zero about its thermodynamic or statistical mechanical entropy.
Moral Shannon was up to something with a clear practical purpose, and he introduced a quantity relevant to his analysis which, on von Neumann’s advice, he called “entropy”. That particular nomenclature was a complete and total disaster. Thermodynamic conclusions cannot be drawn from information-theoretic descriptions or vice-versa. Be on guard when a physicist slips between talking about thermodynamic entropy and Shannon entropy without giving a clear explanation of why any such inference is warranted.