100 likes | 222 Views
Språkbanken i Finland Kielipankki Language Bank of Finland. Nordic Treebank Network Fefor, September 17, 2003. Vem, kuka, who?. The Language Bank of Finland is a service provided by CSC CSC is owned by the Finnish Ministry of Education provides HPCN services to all universities
E N D
Språkbanken i FinlandKielipankkiLanguage Bank of Finland Nordic Treebank Network Fefor, September 17, 2003 AEB/Yleisesittely
Vem, kuka, who? • The Language Bank of Finland is a service provided by CSC • CSC is owned by the Finnish Ministry of Education • provides HPCN services to all universities • maintains scientific applications and databases • CSC focuses on providing shared services • Services are gratis for universities, non-profit for companies • The Language Bank serves the linguistic community in Finland • Server: corpus.csc.fi server (Linux) • Text collections (Finnish and Finland-Swedish) • Taggers • Web based corpus query tool AEB/Yleisesittely
Varför, miksi, why? • There is no Treebank of Finnish at present • … and it is a shame, so • The Language Bank wants to bring about its creation • Infrastructure programme by the Academy of Finland in 2004 • The plan is to use Finnish Dependency Grammar by Connexor • Without query and analysis tools the treebank is just a large heap of files • We need information on tools and technology in order to create a nice service for linguists and language technology professionals AEB/Yleisesittely
At present the Language Bank offers... (1) • Text collection of Finnish • 180 million words • 60 % with msd tags (TextMorfo 2.0) • Text collection of Finland-Swedish • 32 million words • 100 % with msd tags (SWECG) • Swedish PAROLE • 19 million words (courtesy of Språkbanken, Gothenburg) • Other: • Le Monde 1990, German PAROLE, FISC, Susanne, OTA, Middle French, Oulu AEB/Yleisesittely
At present the Language Bank offers... (2) • WWW Lemmie 2.0 (screenshot on next slide) • Easy-to-use corpus query tool developed at CSC • Taggers • Fi-lite (Connexor) • En-lite (Connexor) • ENGCG (Lingsoft) • SWECG (Lingsoft) • FINTWOL (Lingsoft) • TextMorfo (Kielikone) • Morfo (Kielikone) AEB/Yleisesittely
In the past the Language Bank has been active in... • Preparing ground for research programmes • Preliminary survey on language technology 1998 • Preliminary survey on spoken language research 2001 • Participating in programmes with universities • Enlargement of text collections 1999-2001 • Integrated resources for speech technology and spoken language research 2002-2004 AEB/Yleisesittely
In the future the Language Bank will offer... • Spoken language data • Academy of Finland funding • The work is being done • Annotation editor for spoken language data (screenshot on next slide) • Annotation interchange format in RDF • Supports collaborative annotation • Treebank of Finnish ;-) • Just need some money… • Better tools for querying and processing research data AEB/Yleisesittely
More information • http://www.csc.fi/kielipankki/ • manne.miettinen@csc.fi AEB/Yleisesittely