1 / 5

Generality and Openness in Enabling Methodologies for Morphology and Text Processing

Generality and Openness in Enabling Methodologies for Morphology and Text Processing. Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki. Tools to make tools. Annotated resources are tools for machine learning and theory developers, for making applications

ossie
Download Presentation

Generality and Openness in Enabling Methodologies for Morphology and Text Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generality and Openness in Enabling Methodologies forMorphology and Text Processing Anssi Yli-Jyrä Department of General Linguistics, University of Helsinki

  2. Tools to make tools... • Annotated resources are tools for machine learning and theory developers, for making applications • Morphological annotation of morphologically comples languages is difficult. Computational lexicons are tools to make annotation. • Finite-state compilers are among most useful tools to make computational word-form lexicons. • Open sourcing and collaboration is a tool to make methods widely available.

  3. Limited availability of finite-state tools • existing proprietary tools for morphology and shallow processing: • finite-state tools are expensive to develop (e.g. many man years), but very useful • Can the users get support in the future? Can we get the tools in the tomorrow’s machines? • Who may use the compilers, lexicons and corpora? • the open source alternatives: • diversity of alternative tools (Unitex, SFST, ... ) • low interoperability • much more limited functionality • few standardized interfaces and formats • rejection of finite-state technologies (eg. in Hebrew)

  4. Current Challenges • Less-studied, morphologically rich languages are still in need of new professional, fully functional tools • Descriptions without free compilers and run-time implementation are not free in practice! • Ad-hoc tools reduce the productivity of basic resource development • Confusion among the users • Effects to the corpus resource creation in any language • Many technologically appropriate, but proprietary tools limit the distribution of the linguistic model and applications developed. • Proprietary compiler tools may induce restrictions on lthe corpora analysed with the descriptions. • Many proprietary analysers hinder the development of widely available treebanks even in well-studied languages • Closed, non-extendible tools hinder long-term, incremental development of OS tools

  5. Initiative: Interoperable FS tools Initial surveys • Yli-Jyrä et al. (2006), Infrastructures WS, 2006, Genova. • Another paper in Nordic Journal of African Studies, 2005. • Purpose: to increase collaboration between tool providers and satisfaction among users Complementary tools: • interoperability, user’s interfaces, standard file formats, converters etc. to get more of the existing tools • free APIs to integration to various end-user applications • web-based services that apply methods on-demand The evolution of tools enabled by OS solution • extensibility of finite-state compilers & related formalisms • finite-state methods for machine learning and active learning • help to implement BLARK for various languages • increase the quality of lexicons and taggers

More Related