200 likes | 314 Views
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages. Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur. Redundancy in Natural Systems.
E N D
Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur
Redundancy in Natural Systems • Reduce the risk of information loss – fault tolerance • Examples of redundancy: • Biological systems – Codons, genes, proteins etc. • Linguistic systems – Synonymous words • Human Brain – Perhaps the biggest example of neuronal redundancy
Redundancy in Sound Systems • Like any other natural system, human speech sound systems are expected to show redundancy in the information they encode • In this work we attempt to • Mathematically formulate this redundancy, and, • Unravel the interesting patterns (if any) that results from this formulation
plosive voiceless voiced bilabial /b/ /p/ /t/ /d/ dental Feature Economy: An age-old Principle • Sounds, especially consonants, tend to occur in pairs that are highly correlated in terms of their features • Languages tend to maximize combinatorial possibilities of a few features to produce many consonants If a language has in its inventory then it will also tend to have
Mathematical Formulation • We use the concepts of information theory to quantify feature economy (assuming features are Boolean) • The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel Ideal Scenario Inventory of Size N Info. Undistorted Noiseless Channel log2N bits are required for lossless transmission
Mathematical Formulation • We use the concepts of information theory to quantify feature economy (assuming features are Boolean) • The basic idea is to compute the number of bits req-uired to pass the information of an inventory of size N over a transmission channel General Scenario Inventory of Size N Info. Distorted Noisy Channel >log2N bits are required for lossless transmission
pf pf qf N N N Feature Entropy • The actual number of bits required can be estimated by calculating the binary entropy as follows • pf – number of consonants in the inventory in which feature f is present • qf –number of consonants in the inventory in which feature f is absent • The probability that a consonant chosen at random form the inventory has f is and that is does not have f is (=1- )
FE log2N pf pf qf qf N N N N Feature Entropy • If F denote the set of all features, FE= –∑fєF log2 + log2 • Redundancy Ratio (RR) RR = • The excess number of bits required to represent the inventory
Experimentation • Data Source • UCLA Phonological Inventory Database • Samples data uniformly from almost all linguistic families • Hosts phonological systems of 317 languages • Number of Consonants: 541 • Number of Vowels: 151
RR: Consonant Inventories • The slope of the line fit is -0.0178 RR is almost invariant with respect to the inventory size • The result means that consonant inventories are organized to have similar redundancy irrespective of their size important because no such explanation yet Redundancy Ratio Inventory Size
The Invariance is not “by chance” • The invariance in the distribution of RRs for consonant inventories did not emerge by chance • Can be validated by a standard test of hypothesis • Null Hypothesis:The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.
Generation of Random Inventories • Model I – Purely random model • The distribution of the consonant inventory size is assumed to be known a priori • Conceive of 317 bins corresponding to the languages in UPSID • Pick a bin and fill it by randomly choosing consonants (without repetition) from the pool of 541 available consonants • Repeat the above step until all the bins are packed Pool of phonemes /t/ /d/ /n/ /b/ /k/ /p/ /m/ ……………… Fill randomly Bin 1 Bin 2 Bin 317 /m/ /k/ /t/ /b/ /g/ /n/ …………………………………………….. /p/ /n/ /p/ /p/ /d/ /d/ 4 6 2
Generation of Random Inventories • Model II – Random model based on Occurrence Frequency • For each consonant c let the frequency of occurrence in UPSID be denoted by fc. • Let there be 317 bins each corresponding to a language in UPSID. • fc bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. Pool of phonemes /t/ (25) /n/ (12) /p/ (100) ……………………. Bin 1 Bin 2 Bin 317 Choose 25 bins randomly and fill with /t/ /m/ /k/ /t/ /t/ /b/ /g/ /n/ …………………………………………….. /p/ /n/ /p/ /p/ /d/ /d/
Results • Model I – t-test indicates that the null hypothesis can be rejected with (100 - 9.29e-15)% confidence • Model II – Once again in this case t-test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence • Occurrence frequency governs the organization of the consonant inventories at least to some extent Average Redundancy Ratio Model I Model II Real Inventory Size
The Case of Vowel Inventories • The slope of the line fit is -0.125 • For small inventories RR is not invariant while for Larger ones (size > 12) it is so • Smaller inventories perceptual contrast and Larger inventories feature economy • t-test shows that we can be 99.93% confident that the two inventories are different in terms of RR Vowels Redundancy Ratio Consonants Inventory Size
Error Correcting Capability • For most of the consonant inventories the average hamming distance between two consonants is 4 1 bit error correcting capability • Vowel inventories do not indicate any such fixed error correcting capability Average Hamming Distance Consonants Vowels Inventory Size
Conclusions • Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size, • This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories, • Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.
Discussions • Cause of the origins of redundancy in a linguistic system • Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion • Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)