360 likes | 489 Views
ParaMor. Across Mor phology. Finding Para digms. C hristian M onson. Turkish Morphology – Beads on a String. present progressive. 2 nd person singular. take. pass ive. negative. You are not being taken. Turkish Morphology – Beads on a String. götür. ül. m. ü yor. s u n.
E N D
ParaMor Across Morphology Finding Paradigms Christian Monson
Turkish Morphology – Beads on a String present progressive 2nd person singular take passive negative You are not being taken
Turkish Morphology – Beads on a String götür ül m üyor sun present progressive 2nd person singular take passive negative You are not being taken
Applications of Computational Morphology • Machine Translation • Turkish-English (Oflazer, 2007) • Czech-English (Goldwater and McClsky, 2005) • Speech Recognition • Finnish (Creutz, 2006) • Information Retrieval
Challenges of Computational Morphology • Time Consuming • Kemal Oflazer estimates • 3-4 months to build basic Turkish analyzer • Plus lexicon development and maintenance • Expertise Needed • Greenlandic • Official language of Greenland • Agglutinative Inuit language • 50,000 speakers • Per Langaard
The Solution Raw Text Unsupervised Morphology Induction
Paradigms – The Structure of Morphology ül m üyor sun present progressive 2nd person singular take passive negative
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor sun present progressive 2nd person singular take passive negative
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um um present progressive take passive negative 1st person singular
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um um Ø present progressive take passive negative 3rd person singular
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um um Ø uz present progressive take passive negative 1st person plural
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um um Ø uz present progressive take passive negative
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um yecek um Ø uz take passive negative future
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um yecek um Ø uz take passive negative
Paradigms – The Structure of Morphology Tense & Mood Person & Number Stem Voice Polarity ül m üyor um yecek um Ø uz
Paradigms – The Structure of Morphology • Paradigm • Set of mutually replaceable strings ül m üyor um yecek um Ø uz
Paradigms – The Structure of Morphology Paradigms • Paradigm • Set of mutually replaceable strings ül m üyor um yecek um Ø uz
Paradigms – The Structure of Morphology Paradigm • Paradigm • Set of mutually replaceable strings ül m üyor um yecek um Ø uz
Paradigms – The Structure of Morphology Paradigm • Paradigm • Set of mutually replaceable strings ül m üyor um yecek um Ø uz
Overview • ParaMor • Unsupervised morphology induction system
Overview • ParaMor • Unsupervised morphology induction system • Evaluation Methodology
Overview • ParaMor • Unsupervised morphology induction system • Evaluation Methodology • Results
The ParaMor Algorithm • Identify paradigms in 3 steps
The ParaMor Algorithm • Identify paradigms in 3 steps • Search for candidate paradigms
The ParaMor Algorithm • Identify paradigms in 3 steps • Search for candidate paradigms • Cluster candidates modeling the same paradigm
The ParaMor Algorithm • Identify paradigms in 3 steps • Search for candidate paradigms • Cluster candidates modeling the same paradigm • Filter
The ParaMor Algorithm • Paradigm discovery in 3 steps • Search for candidate paradigms • Cluster candidates modeling the same paradigm • Filter • Segment words • using the discovered paradigms
Search for Candidate Paradigms ra rada radas rado rados ran rar raron ró 23 a ada adas adoados an ar aron ó 1786 Ø da das do dos n ndo r ron 118 strada stradas strado strar stró 7 a an ar ó 353 a as o os 892 strada strado strar stró 8 rada radas rado rados 53 Ø do n r 354 strada strado stró 9 rada rado rados 67 a an ar 413 Ø n r 509 a o os 1410 Ø r s 287 strada strado 12 rada rado 89 a an 1049 Ø es 874 Ø n 1874 a o 2304 Ø s 5501 strado 15 rado 167 an 1786 es 2751 n 6051 a 8981 s 10662 ... ...
Search for Candidate Paradigms ra rada radas rado rados ran rar raron ró 167 a ada adas ado ados an ar aron ó 1786 Ø da das do dos n ndo r ron 6051 strada stradas strado strar stró 7 rada radas rado rados 167 a an ar ó 1786 Ø do n r 6051 a as o os 8981 strada strado strar stró 8 rada rado rados 167 a an ar 1786 Ø n r 6051 a o os 8981 Ø r s 287 strada strado stró 9 a an 1786 Ø es 10662 Ø n 6051 a o 8981 Ø s 5501 rada rado 167 strada strado 12 strado 15 rado 167 an 1786 es 10662 n 6051 a 8981 s 10662 ...
ra rada radas rado rados ran rar raron ró 167 a ada adas ado ados an ar aron ó 1786 Ø da das do dos n ndo r ron 6051 a ra rada radas rado rados rar raron ró 167 a ada ado ados an ar aron ó 1786 Ø da das do dos n r ron 6051 trada tradas trado trados trar traron tró 167 rada radas rado rados rar raron ró 167 a ada ado an ar aron ó 1786 Ø da do dos n r ron 6051 trada tradas trado trados trar tró 167 rada radas rado rados rar ró 167 a ado an ar aron ó 1786 Ø da do n r ron 6051 strada stradas strado strar stró 7 trada tradas trado trar tró 30 rada radas rado rados rar 167 a ado an ar ó 1786 Ø do n r ron 6051 a an ar ó 1786 strada strado strar stró 8 trada trado trar tró 30 rada radas rado rados 167 Ø do n r 6051 a as o os 8981 Ø r s 287 strada strado stró 9 trada trado tró 30 rada rado rados 167 a an ar 1786 Ø n r 6051 a o os 8981 Ø s 5501 strada strado 12 trada trado 30 rada rado 167 a an 1786 Ø es 10662 Ø n 6051 a o 8981 strado 15 trado 30 rado 167 an 1786 es 10662 n 6051 a 8981 s 10662 ... ... ...
a 17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: 0.715 532 Covered Types 16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría ó Cosine Similarity: 0.664 451 Covered Types 15 Suffixes a aba aban ada adas ado ados an ando ar aron arse ará arán ó 25 Stems anunci, aplic, apoy, celebr, consider, …375 Covered Types 15 Suffixes a aba ada adas ado ados an ando ar aron arse ará arán aría ó 22 Stems anunci, aplic, apoy, celebr, concentr, …330 Covered Types 15 Suffixes a aba ada adas ado ados an ando ar ara aron arse ará arán ó 23 Stems anunci, apoy, confirm, consider, declar, … 345 Covered Types
a F1 ParaMor & Morfessor Morfessor ParaMor Bernhard 2
a ParaMor Identify Search Cluster Filter Segment
a ParaMor Identify Search Cluster Filter Segment
Morphology in NLP sun götür ül m üyor um present progressive 2nd person singular take passive negative You are not being taken