50 likes | 207 Views
Generality and Openness in Enabling Methodologies for Morphology and Text Processing. Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki. Tools to make tools. Annotated resources are tools for machine learning and theory developers, for making applications
E N D
Generality and Openness in Enabling Methodologies forMorphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki
Tools to make tools... • Annotated resources are tools for machine learning and theory developers, for making applications • Morphological annotation of morphologically comples languages is difficult. Computational lexicons are tools to make annotation. • Finite-state compilers are among most useful tools to make computational word-form lexicons. • Open sourcing and collaboration is a tool to make methods widely available.
Limited availability of finite-state tools • existing proprietary tools for morphology and shallow processing: • finite-state tools are expensive to develop (e.g. many man years), but very useful • Can the users get support in the future? Can we get the tools in the tomorrow’s machines? • Who may use the compilers, lexicons and corpora? • the open source alternatives: • diversity of alternative tools (Unitex, SFST, ... ) • low interoperability • much more limited functionality • few standardized interfaces and formats • rejection of finite-state technologies (eg. in Hebrew)
Current Challenges • Less-studied, morphologically rich languages are still in need of new professional, fully functional tools • Descriptions without free compilers and run-time implementation are not free in practice! • Ad-hoc tools reduce the productivity of basic resource development • Confusion among the users • Effects to the corpus resource creation in any language • Many technologically appropriate, but proprietary tools limit the distribution of the linguistic model and applications developed. • Proprietary compiler tools may induce restrictions on lthe corpora analysed with the descriptions. • Many proprietary analysers hinder the development of widely available treebanks even in well-studied languages • Closed, non-extendible tools hinder long-term, incremental development of OS tools
Initiative: Interoperable FS tools Initial surveys • Yli-Jyrä et al. (2006), Infrastructures WS, 2006, Genova. • Another paper in Nordic Journal of African Studies, 2005. • Purpose: to increase collaboration between tool providers and satisfaction among users Complementary tools: • interoperability, user’s interfaces, standard file formats, converters etc. to get more of the existing tools • free APIs to integration to various end-user applications • web-based services that apply methods on-demand The evolution of tools enabled by OS solution • extensibility of finite-state compilers & related formalisms • finite-state methods for machine learning and active learning • help to implement BLARK for various languages • increase the quality of lexicons and taggers