230 likes | 333 Views
Information and uncertainty. Manipulating symbols. Last class Typology of signs Sign systems Symbols Tremendously important distinctions for informatics and computational sciences Computation = symbol manipulation
E N D
Manipulating symbols • Last class • Typology of signs • Sign systems • Symbols • Tremendously important distinctions for informatics and computational sciences • Computation = symbol manipulation • Symbols can be manipulated without reference to content (syntactically), due to the arbitrary nature of convention • Allows computers to operate! • All signs rely on a certain amount of convention, as all signs have a pragmatic (social) dimension, but symbols are the only signs which require exclusively a social convention, or code, to be understood.
Symbol manipulation aedl: adeladleaedlaeldaldealeddaeldaledealdeladlaedleaeadlealdedaledlaeladeldaladelaedldaeldealeadleda 4! Permutations: 4 x 3 x 2 x 1 = 24 • Some have meaning (in some language) • The relation between symbols and meaning is arbitrary • Example: cut-up method for generating poetry pioneered by BrionGysin and William Burroughs and often used by artists such as David Bowie, or use of samples in electronic music
Information theory • “The mathematical theory of communication”, Claude Shannon (1948) • Efficiency of information transmission in electronic channels • Key concept: information quantity that can be measured unequivocally (objectively) • Does not deal at all with the subjective aspects of information semantics and pragmatics. • Information is defined as a quantity that depends on symbol manipulation alone
What’s an information quantity? • How to quantify a relation? • Information is a relation between an agent, a sign and a thing, rather than simply a thing. • The most palpable element in the information relation is the sign, symbols • But which symbols do we use to quantify the information contained in messages? • Several symbol systems can be used to convey the same message • We must agree on the same symbol system for all messages!
EALD ALDE LDEA DELA EADL ALED LDAE DLEA ELDA ADEL LEDA DLAE ELAD ADLE LEAD DAEL EDLA AELD LADE DALE EDAL AEDL LAED What’s an information quantity? • Both sender and receiver must use the same code, or convention, to encode and decode symbols from and to messages. • We need to fix the language used for communication • Set of symbols allowed (an alphabet) • The rules to manipulate symbols (syntax) • The meaning of the symbols (semantics). • A language specifies the universe of all possible messages = Set of all possible symbol strings of a given size. • Shannon Information is thus defined as “a measure of the freedom from choice with which a message is selected from the set of all possible messages” DEAL is 1 out of 4! = 4×3×2×1 = 24 choices. DEAL
What’s an information quantity? Information is defined as “a measure of the freedom from choice with which a message is selected from the set of all possible messages” Bit (short for binary digit) is the most elementary choice one can make between two items: “0’ and “1”, “heads” or “tails”, “true” or “false”, etc. Bit is equivalent to the choice between two equally likely choices. Example, if we know that a coin is to be tossed, but are unable to see it as it falls, a message telling whether the coin came up heads or tails gives us one bit of information
Decision-making • Decision-making: • Perhaps the most fundamental capability of human beings • Decision always implies uncertainty • Choice • Lack of information, randomness, noise, Error “The highest manifestation of life consists in this: that a being governs its own actions. A thing which is always subject to the direction of another is somewhat of a dead thing. ” “A man has free choice to the extent that he is rational.” (St. Thomas Aquinas) “In a predestinate world, decision would be illusory; in a world of perfect foreknowledge, empty; in a world without natural order, powerless. Our intuitive attitude to life implies non-illusory, non-empty, non-powerless decision… Since decision in this sense excludes both perfect foresight and anarchy in nature, it must be defined as choice in face of bounded uncertainty” (George Shackle)
Uncertainty-based information: original contributions Information is transmitted through noisy communication channels: Ralph Hartley and Claude Shannon (at Bell Labs), the fathers of Information Theory, worked on the problem of efficiently transmitting information; i.e. decreasing the uncertainty in the transmission of information. • Hartley, R.V.L., "Transmission of Information", Bell System Technical Journal, July 1928, p.535. • C. E. Shannon, ``A mathematical theory of communication,'' Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948.
Choices: multiplication principle • “If some choice can be made in M different ways, and some subsequent choice can be made in N different ways, then there are M x N different ways these choices can be made in succession” [Paulos] • 3 shirts and 4 pants = 3 x 4 = 12 outfit choices
x1 x3 B xn x2 Number of Choices Measured in bits Hartley uncertainty • Nonspecificity: Hartley measure • The amount of uncertainty associated with a set of alternatives (e.g. messages) is measured by the amount of information needed to remove the uncertainty • A type of ambiguity A = Set of Alternatives Quantifies how many yes-no questions need to be asked to establish what the correct alternative is
A AxB Number of Choices Measured in bits Hartley uncertainty • Menu Choices • A = 16 Entrees • B = 4 Desserts • How many dinner combinations? • 16 x 4 = 64 Quantifies how many yes-no questions need to be asked to establish what the correct alternative is H(AxB) = log2(16x4) = log2(16)+log2(4) = 4+2 = 6
Number of Choices Measured in bits Hartley uncertainty: decision trees
What about probability? • Some alternatives may be more probable than others! • A different type of ambiguity • Higher frequency alternatives: less information required • Measured by Shannon’s entropy measure • The amount of uncertainty associated with a set of alternatives (e.g. messages) is measured by the average amount of information needed to remove the uncertainty Probability distribution of letters in English text (Orwell’s 1984 in fact):
x1 x3 xn x2 Probability of alternative Measured in bits Shannon’s entropy A = Set of weighted Alternatives • Shannon’s measure • The average amount of uncertainty associated with a set of weighted alternatives (e.g. messages) is measured by the average amount of information needed to remove the uncertainty
Entropy of a message Message encoded in an alphabet of n symbols, for example: English = 26 characters + space More code = dots, dashes and spaces DNA: A, T, G, C
What it measures • missing information, how much information is needed to establish what the symbol is, or • uncertainty about what the symbol is, or • on average, how many yes-no questions need to be asked to establish what the symbol is. One alternative Uniform distribution
Example: Morse code 1) All dots: p1 = 1, p2 = p3 = 0. Take any symbol – it’s a dot; no uncertainty, no question needed, no missing information, HS = -1.log2(1) = 0. 2) 50-50 dots and dashes: p1 = p2 = 1/2, p3 = 0. Given the probabilities, need to ask one question one piece of missing information HS = -(1/2.log2(1/2) + 1/2.log2(1/2) ) = -1.log2(1/2) = - (log2(1) - log2(2)) = log2(2) = 1 bit 3) Uniform: all symbols equally likely, p1 = p2 = p3 = 1/3. Given the probabilities, need to ask as many as 2 questions - 2 pieces of missing information, HS = - log2(1/3) = - (log2(1) - log2(3)) = log2(3) = 1.59 bits
Bits, entropy and Huffman codes Given a symbol set {A,B,C,D,E} And occurrence probabilities PA, PB, PC, PD, PE, The Shannon entropy then corresponds to: The average minimum number of bits needed to represent a symbol Huffman coding: variable length coding for messages whose symbols have variable frequencies that minimizes number of bits per symbol? Coding: H = -(0.250*log2(0.250)+ 0.375*log2(0.375)+ 0.167*log2(0.167)+ 0.125*log2(0.125)+ 0.083*log2(0.083)) = 2.135 Huffman code: #bits per symbol= 0.375 * 1+ 0.250 * 2+ 0.167 * 3+ 0.125 * 4+ 0.083 * 4= 2.208
Critique of Shannon’s communication theory • The entropy formula as a measure of information is arbitrary • Shannon’s theory measures quantities of information, but it does not consider information content • In Shannon’s theory, the semantic aspects of information are irrelevant to the engineering problem
Other forms of uncertainty • Vagueness or fuzziness • Simultaneously being “True” and “False” • Fuzzy Logic and Fuzzy Set Theory
Set of all People Set of all People Tall People Tall People From crisp to fuzzy sets • Fuzziness: Being and Not Being • Laws of Contradiction and Excluded Middle are Broken 1 1
This week’s discussion • Papers: • 1) boyd, danah and Crawford, Kate, Six Provocations for Big Data (September 21, 2011). A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011R. • 2)Ryuji Suzuki, John R. Buck and Peter L. Tyack (2006) Information entropy of humpback whale songs, J. Acoust. Soc. Am, 199(3), March • 3)David A. Huffman (1952). A method for the construction of Minimum-Redundancy Codes, in Proceedings of the I.R.E, September.