270 likes | 525 Views
M ātāpuna Dictionary Database System. The. Open Source Multi-user Web-based. Dictionary Writing System. Dave Moskovitz DWS 2004, Brno. Outline. M ātāpuna – Dave Moskovitz – www.thinktank.co.nz. Background info Design criteria Functions Database structure Future development Lab
E N D
Mātāpuna Dictionary Database System The • Open Source • Multi-user • Web-based Dictionary Writing System Dave MoskovitzDWS 2004, Brno
Outline Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Background info • Design criteria • Functions • Database structure • Future development • Lab • Call for collaboration
Background – New Zealand / Aotearoa / Māori Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 4m people; 268,000 km2 • 15% Māori; 1 in 4 of those speakMāori • Median age 22; median income NZD14,000(compared to 35 and 18,500 for pākehā) • Māori is an official language, polynesian language group • Uses standard roman character set with macrons
Background – New Zealand in the Pacific Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Background – The Mātāpuna project Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • First monolingual dictionary of Māori, written from Māori cultural perspective • Target of 20,000 entries (1 entry = 1 definition) • Designed for language learners with some proficiency • 3+ year project under auspices of Te Taura Whiri i te Reo Māori / The Māori Language Commission • 4 writers, one editor, one lexicographer, one project manager, admin support, one geek
Background – The Mātāpuna team Mātāpuna – Dave Moskovitz – www.thinktank.co.nz Pou Temara Phil Matthews Te Waireka Walker Ruka Broughton Wiha Te Rakihawea Hēni Jacobs Sharon Armstrong Not in photo: Te Awanuiārangi Black, Te Haumihiata Mason, Dave Moskovitz
Background – Dave Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • BA (Hons) Comp Sci Univ. California Berkeley • Began PhD in Applied Linguistics – NZ Sign Language phonology • 25 years in IT industry • Background in Application Development, Systems Architecture, System Performance, Internet • 3rd lexicography project, after Dictionary of NZ Sign Language and Oxford NZ Dictionary • Open Source bigot
Background – Software Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Free Software – Open Source – GPL • Uses Linux, Apache, mod_perl, Postgres, runs on any old hardware (eg Pentium 600) • Browser based • About 4,000 lines of Perl code • Won Computerworld excellence award for use of IT in Government
Open Source is Good for Lexicography Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Free • Market is too small to support proprietary software • Everyone’s needs are unique – and you can modify the source code to suit • Open source programmers not hard to find • Low risk and futureproof: no vendor lock-in • Everyone helps each other • Software is open, but data is not (necessarily)
Design Criteria Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Easy to use by untrained lexicographers • Support workflow and management as well as entry • End-to-end processing • Produce printed output as well as web access • Multiuser • Multilingual interface, easy to add languages • Unicode-based, allows any character set to be used
Sample Output Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Add • Search • Edit • Corpus search • Reports
Functions - Add Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions - Search Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions - Edit Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions – Corpus search Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions – Reports Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Functions – Validation Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Field-based, including:- orthography- punctuation- blank- undefined word / not in defining vocab- synonym rules
Functions – Workflow Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Basic workflow:Add → Self check → Editor 1 → Editor 2 • Editor can make minor changes, or send the entry back to the owner • Owner is notified of any changes by email • You can always view the history of an entry
Functions – Synonym handling Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Entries allow for synomym ‘families’ • Master – slave (tuakana – teina) relationship • Masters can’t have masters and slaves can’t have slaves • Slave definitions printed from master • All cross-references managed
Functions – Multilingual interface Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 186 text snippets • Can add additional languages
Functions – Multilingual interface Mātāpuna – Dave Moskovitz – www.thinktank.co.nz
Database Structure Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • wordclass • category • examplesource • headword • qastatus • hwarchive • matapunauser • activityjournal
Future development Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Multiple citations • Bilingual / multilingual • More corpus material (and better corpus performance) • Advanced search • Better user administration • XML / SGML export • More languages • … what do you want or need ????
Lab – Words from the Olympics Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 15 users, 15 categories of words • Rawiri is the editor • Practise entering definitions, linking synonyms, playing with major and minor senses, searching, breaking validation rules … • Be nice to Rawiri, he can send work back to you to get fixed
Call for collaboration Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • This is Free Software • Use it and contribute enhancements • It’s robust and capable of producing a major lexicographical work • We are interested in your feedback and participation
Call for collaboration Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Contact:Dave MoskovitzThinktank Consulting LimitedPO Box 15-212Wellington, New Zealanddave@thinktank.co.nz+64 27 220 2202