160 likes | 305 Views
CSA4050: Advanced Topics in NLP. Computational Morphology II Introduction 2 Level Morphology. The Problem. So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example: en + large + ment + s
E N D
CSA4050: Advanced Topicsin NLP Computational Morphology II Introduction 2 Level Morphology
The Problem • So far we have assumed that words are formed by strict concatenation of component morphemes as in the following example:en + large + ment + s • This assumption is convenient because it imposes a 1:1 correspondence between segmentation of the string and lookup of lexical items (which may be different types e.g. roots, affixes, particles etc) • The problem is that this is an unrealistic assumption to make. CSA405 Lecture 2lev
English Spelling Rules • Final consonant doublingbegin + ing = beginning • s to eschurch + s = churches • y to i carry + ed = carried • Final e deletionrake + ing = raking • n to min + practical = impractical CSA405 Lecture 2lev
dhalt dhalt dahal dahlet dhalna dhaltu dahlu Deletion of vowel Changes or insertion of vowel Non-concatenative morphology [in examples h should be crossed] Semitic Languages CSA405 Lecture 2lev
Handling Spelling Rules • Such phenomena usually occur at morpheme boundaries, and prevent direct lookup of the surface string in the lexicon. • The solution is to suppose that two strings are involved: • The surface string: that which appears on the page • The lexical string: that which is used to index items in the lexicon. • What kind of mapping exists between the two strings? CSA405 Lecture 2lev
Lexical Transformations SURFACE STRING LEXICAL STRING CSA405 Lecture 2lev
Phonological Rules • Morphological rules are a reflection of phonological changes. • Assumption: lexical/surface transformation is rule governed. • Phonological rules systems had been extensively studied from the point of view of generative linguistics under Chomsky during the 1970s CSA405 Lecture 2lev
Typical Phonological Rule • Typical rule has the following shapePhon1 -> Phon2//Lcontext __ Rcontext • Meaning: Phoneme Phon1 is transformed to phoneme Phon2 if it occures between left context Lcontext and right context Rcontext • Example[B] -> [P] // __ # • B is pronounced like P if it is word final (cf kelb) CSA405 Lecture 2lev
Properties of Phonological Rules within the Generative Tradition • Rules are rewrite rules • Rules apply sequentially • Rules are ordered • Rules may act upon their own output (cyclic rules) • Effects of rules are not always reversible • Collections of rules have Turing power CSA405 Lecture 2lev
C. Douglas Johnson (1972) • A theory of phonology with the right properties could be implemented using only finite state machinery. • Each rule is associated with a finite state transducer (FST). • All rules operated in simultaneously, thus eliminating the delicate problems of ordering associated with sequential cascades of rules. • The collection of FS rules operating in parallel is mathematically equivalent to a single FST representing the intersection of the component FSTs • Johnson’s work was mainly theoretical. He was not involved with computational issues, in particular the issue of computing the intersection of multiple FSTs. CSA405 Lecture 2lev
FS Automaton For recognition and generation of regular languages. All operations over regular languages have corresponding operations over corresponding FSAs FS Transducer Like FSAs but with output as well as input For recognition and generation of regular relations. Some operations over regular languages do not have corresponding operations over corresponding FSTs Finite State Machinery CSA405 Lecture 2lev
Kimmo Koskenniemi (1983) • Worked on morphology of Finnish and came up with a system of finite state transducers. • Came up with a computational framework for executing collections of finite state transducers in parallel. CSA405 Lecture 2lev
Koskenniemi’s Model SURFACE STRING Interpreter executes round-robin keeping FSTs in lock-step before moving head FST1 FST2 FST3 … FSTn LEXICAL STRING CSA405 Lecture 2lev
Martin Kay and Ron Kaplan (1981) • Kay and Kaplan (both at Xerox PARC) were very interested in the computational issues underlying morphological processing. • In particular, they studied the problems of • How to combine FSTs in parallel (computing the intersection of regular relations) • How to combine FSTs in series (computing the composition of FSTs). • Restrictions on rules have pleasant consequences CSA405 Lecture 2lev
Restrictions on Rules • With the restriction that a rule shall not apply to its own output, Kaplan and Kay showed that the result of combining the corresponding relations under the under the operations of intersection, composition and union remains within a closed subclass of those computable by FSTs. • They then spent many years designing and implementing a calculus for describing and combining FSTs based upon regular expressions. CSA405 Lecture 2lev
Summary Chomsky Generative Tradition Generative Phonology Johnson Parallel Rules Multilevel Cascades of Rules Koskiniemmi Parallel Rules KIMMO PC-Kimmo Xerox Tools xfst/twolc/lexc Kaplan/Kay Calculus CSA405 Lecture 2lev