Morpho Challenge competition 2005-2010 Evaluations and results Authors Mikko Kurimo

Morpho Challenge competition 2005-2010 Evaluations and results Authors MikkoKurimo Sami Virpioja Ville Turunen Krista Lagus

Introduction • Started in 2005. • Open to all. • Organizers selected evaluation tasks, data and metric and performed all the evaluations. • Unsupervised and semi-supervised approach. • Semi-supervised approach was introduced in Morpho Challenge 2010.

Aim • To develop Language – independent algorithms to discover morphemes from text material . • Morphemes : It is the smallest grammatical unit in a language. • To promote research in machine learning , NLP .

Evaluation tasks & languages # From MikkoKurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results.

Word Segmentation • In 2005 : • Segment the text into morphemes . • In 2007 : • Locate the surface form (word segmentation). • Locate which surface form are the allomorph of the same underlying morpheme.

Principles for segmentation The evaluation is based on a subset of the word forms given as training data. The frequency of the word form plays no role in evaluation. The evaluation score is balanced F-measure, the harmonic mean of precision and recall. If the linguistic gold standard has several alternative analysis for one word, for full precision, it is enough that one of the alternatives is equivalent to the proposed analysis

Information retrieval • The algorithms were tested by using the morpheme segmentations for text retrieval. • A stemming algorithm is used to reduce inflected words to base words. • Problem : Language specific. • Challenges • Correct weighting method. • Number of queries were limited.

Machine translation • Two stages • Alignment of parallel sentences in both languages. • Training a language model. • In 2009 Morph challenge the focus was on alignment problem.

Some Algorithms • Bernhard (Bernhard, 2006) : • Best for Finnish , English and German linguistic evaluation. • First list of prefixes and suffixes is extracted. • Segmentations are generated using this list. • Best segmentation is selected on the basis of cost function.

Some Algorithms • Morfessor algorithm : • To discover most basic & compact description of data. • Substrings occurring frequently in the training set are also considered as morphemes. • Ex. hand, hand+s , hand+ful , left+hand+ed. • Gives better result than other algorithms in Finnish & Turkish. • #From : Morfessor in the morpho challenge (2006)by Mathias Creutz , Krista Lagus

Result Morpho Challenge : 2010 • S = semi-supervised algorithm • P = unsupervised algorithm with supervised parameter tuning • #From http://research.ics.aalto.fi/events/morphochallenge2010

Open Challenges • What is the best analysis algorithm ? • What is the meaning of the morphemes ? • How to evaluate the alternative analyses ? • How to improve the analysis using context ? • How to effectively apply semi-supervised learning ?

References • MikkoKurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results. Proceedings of the 11th meeting of the ACL special interest group on Computational Morphology and Phonology . • Mathias Creutz and Krista Lagus. 2006 . Morfessor in the Morpho Challenge. Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes • Official site of Morpho Challenge : http://research.ics.aalto.fi/events/morphochallenge2010/ • Wikipedia : http://en.wikipedia.org/

Thank You

Morpho Challenge competition 2005-2010 Evaluations and results Authors Mikko Kurimo