1 / 23

NooJ international Conference, Komotini , May 2010 Portability of Armenian Corpus by Nooj

NooJ international Conference, Komotini , May 2010 Portability of Armenian Corpus by Nooj. Anaid Donabedian & Victoria Khurshudian Institut National des Langues et Civilisations Orientales (INALCO), Paris. Armenian: preliminaries. an Indo-European language right-branching

gnatalie
Download Presentation

NooJ international Conference, Komotini , May 2010 Portability of Armenian Corpus by Nooj

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NooJ international Conference, Komotini, May 2010Portability of Armenian Corpus by Nooj Anaid Donabedian & Victoria Khurshudian Institut National des Langues et Civilisations Orientales (INALCO), Paris

  2. Armenian: preliminaries • an Indo-European language • right-branching • of an accusative type • typically with an SOV structure and • dominantly with an agglutinative morphology

  3. Historical Armenia

  4. Republic of Armenia

  5. Periodization  prealphabetical alphabetical (405 A.D. – up to present). 1.Old ArmenianorGrabar(V-XI); 2. Middle Armenian (XII-XVI); 3. Modern Armenian(XVII – up to present) Western Eastern (based on Constantinople dialect) (based on Ararat dialect) dialects… dialects….

  6. Objective Provide data compatibility and portability between Nooj and Eastern Armenian National Corpus (EANC) platform

  7. What is Eastern Armenian National Corpus www.eanc.net Corpus Technologies Michael Daniel, Victoria Khurshudian, Dmitri Levonian, Vladimir Plungian, Alexey Polyakov,Sergey Rubakov

  8. Source texts Grammatical dictionary PARSER Annotation algorithm Annotated texts 8

  9. EANC HistoryMoscow, Russia • March 2006:Project Launch • July 2007:1st Release • May 2008:2nd Release • March 2009: 3rd release

  10. Eastern Armenian National Corpus (EANC) is: • about 110 million tokens • morphological and other markup • English translations for frequent tokens • covers SEA from the mid-19th century to the present • both written and oral discourse • full-text view for over 100 Armenian classic titles • open internet access

  11. Written Discourse • over106 mln. tokens • 510authors (1841-2009) • 1039 fiction texts (including 206translated texts) • 7858press issues • non-fiction (scientific and other) texts

  12. Oral Discourse (3.5mln. tokens) • Spontaneous discourse • Polylogues • Task-oriented discourse • TV-shows transcripts • Movies … • EANC oral corpus has all been recorded and transcribed by the project.

  13. EANC Functionality 13

  14. Search Functionality • Token queries • Context queries • Subcorpus selection 14

  15. Search Functionality Simple token queries: • lexeme search • wordform search • gram search • translation search • lexeme + gram search 15

  16. Search Functionality Advanced options for token queries: • case-sensitivity • punctuation marks • position in the sentence • wildcard (*) • logical functions (e.g. ‘or' |) • negated features • grammatical/lexical homonymy inclusion/exclusion 16

  17. Search Functionality Subcorpus selection by: • time • author(s) / title(s) • genres • types of texts (translated vs. original) • superposition of any of the above 17

  18. Search Functionality Display options • context expanding • ‘sort by’ (time, lexeme, wordform etc.) • Latin transliteration • glossed display • KWIC (key word in the context) 18

  19. Transliterated samples: 19

  20. Glossed samples: 20

  21. KWIC samples: 21

  22. Main Current Tasks: • Make Nooj-based Western Armenian morphological annotation compatible with EANC grammatical dictionary structure • Make EANC and Nooj Western Armenian platforms interportable • Mutual full coverage of Nooj and EANC capacities (e.g. syntactical annotation of Nooj)

More Related