270 likes | 381 Views
How much information does a language have?. Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951. Motivation/ Skills. Redundancy.
E N D
How much information does a language have? • Shanon, C. Prediction and Entropy of Printed English, Bell System Technical Journal, 1951
Redundancy The redundancy of ordinary English, not considering statistical structure over greater distances than about eight letters, is roughly 50%. This means that when we write En_ _ _sh ha_f o_ w_ _t w_ w_ _te i_ dete_ _ _ _e_ b_ t_e str_ct_r_ _ f _ _ _ lang_ _ _ _ a_d H_ _f i_ c_os_n fre_ _ _ Redundancy =1-H/Hmax
Entropy How much information is produced on average for each letter
Saisi par l'inspiration, il composa illico un lai, qui, suivant la tradition du Canticum Canticorum Salomonis, magnifiait l'illuminant corps d'Anastasia : Ton corps, un grand galion où j'irai au long-cours, un sloop, un brigantin tanguant sous mon roulis, Ton front, un fort dont j'irai à l'assaut, un bastion, un glacis qui fondra sous l'aquilon du transport qui m'agit, ‘L’Evêqe en effet est très streect: le clergé, de temps en temps, se permet de révéler ses préférences envers des ‘événements’ frenchement débreedés, mets l’évêqe hème qe ses fêtes respectent des règles sévères et les trensgresser, c’est fréqemment reesqer de se fère relegger’.
How much information is obtained by adding one letter? S E E SE
3 order IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.
Is English trying to warn us? 992-995 America ensure oil opportunity 2629-2634 bush admit specifically agents smell denied 16047-16048 arafat unhealthy
How to continue? Aoccdrnig to rseearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoatnt tihng is that the frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.
Revealing the statistic of the language • Q….. 2034 words start with q • ….q 8 words finish with q ….q …. Ira q q 0 0.1
Revealing the statistic of the language THERE IS NO REVERSE ON A MOT0RCYCLE 1115112112111511711121321227111141111131 FRlEND 0F MINE FOUND THIS OUT 861311111111111621111112111111 RATHER DRAMATICALLY THE OTHER DAY 41111111151111111111161111111111111 R R R R 1 1 4 1
# of times guessed Position of the guessed letter # Guesses
What is the probability to find the number 1 in the third position? THE 1 1 1 REV 1 1 5 ERS 1 1 2 MOT 1 1 2 THA 112
THE 1 1 1 ANT 3 1 3 ERS 1 1 2 MOT 1 1 2 HER 222 THA 1 1 2 HEN 1 1 3 ERS 1 1 2 TH_ 1 1 3 AN_ 312 HE_ 2 2 1 REV 1 1 5 ERS 1 1 2 MOT 1 1 2 AND 311 LASCU Probability to find the number I in the place N
Bounds THERE IS NO REVERSE ON A MOT0RCYCLE 1115112112111511711121321227111141111131 F0 (all the letter have the same probability) F0 (all the numbers have the same probability) F1 (each number has its own probability) F1 (each letter has its own probability) F2 (correlation of two letters) F2 (correlation of two numbers) FN
Bounds Redundancy~ 75%