310 likes | 487 Views
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary. Kevin Jansz kjansz@sultry.arts.usyd.edu.au Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA
E N D
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Janszkjansz@sultry.arts.usyd.edu.au Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore
Objectives • Provide innovative ways for representing a dictionary, through creative use of web technology • Provide practical, educationally useful access to information that can be customised to suit the needs of many users (at low labour cost) • Examine the richness of lexical structure Initial target: the Warlpiri dictionary.
Research Program: Lexicon • A language is more than individual words with a definition • it is a vast network of associations between words and within and across the concepts represented by words • Aim to provide people with a better understanding of this conceptual map. • Traditional paper dictionaries offer very limited ways for making such networks visible • There are no such limitations on a computer
Research: Computational Lexicography • Dictionaries on computers are now commonplace • Few utilise the potential of the new medium • Many present a plain, search-oriented representation of the paper version • Goal: fun dictionary tools that are effective for language learning, browsing • Like flicking through pages of a paper dictionary • Words are grouped by their meaning and their association with each other • Key to the effectiveness of this browsing is that the user has control over the way this is presented.
Initial focus: Warlpiri • Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice) • There are a number of factors influencing this choice: • One of the most comprehensive lexical databases for any Australian Language (Laughren & Nash 1983) • Relatively large community of people interested in learning their traditional language • Until now, results haven’t been produced in a format usable by the community (only raw printouts)
Educational goals • Dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met • The low level of literacy in the region makes an e-dictionary potentially more useful than a paper edition • less dependent on good knowledge of spelling and alphabetical order. • Making it fun and easy to use, and providing multimedia content and the pronunciations of words is a considerable help as well
Kirrkirr: A Warlpiri dictionary browser (Jansz 1998; Jansz, Manning and Indurkhya 1999) • An environment for the interactive exploration of dictionaries. • Current work has just been with Warlpiri, the design is general (Arrernte coming soon!) • Attempts to more fully utilise graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information • It can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross-platform support).
Overview • Animated Graph layout of word relationships
Overview • Graph layout • Formatted entries
Overview • Graph layout • Formatted entries • A Notes facility for ‘jotting in the margin’
Overview • Graph layout • Formatted entries • Notes • Multimedia: audio, pictures
Overview • Graph layout • Formatted entries • Notes • Multimedia • Advanced searching interfaces
Overview • Graph layout • Formatted entries • Notes • Multimedia • Advanced searching • Semantic Domain Browsing
Overview • Graph layout • Formatted entries • Notes • Multimedia • Advanced searching • Semantic Domain Browsing • Others in planning: formatting (XSL) editing, figuration patterns. • These attempt to cater to users with different interests and competence levels
MRD Structure • The internal structures of current Machine Readable Dictionaries (MRDs) usually merely mimic the structure of the printed form (Boguraev 1990) • Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organisation (in WordNet, organisation via “synsets” which are related via links of part, subkind, opposite) • But there has been little in the way of software to make such research truly usable by different communities of users.
The lexical database • Original materials stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags • These were converted to XML using an error-correcting stack-based parser (written in PERL). • The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task. • But parser tries to impose data integrity • Use of XML gives a clear structure to the lexical data, and makes available many (free) tools • Result remains a portable, tangible text file
XML indexing - challenges • Few XML parsers make single entries retrievable from the file • Typically, the entire XML document is put in memory • This is not practical when parsing significant XML databases (e.g., the Warlpiri dictionary is approx. 10Mb).
XML Dictionary Indexing (XDI) • Hierarchical structure of XML lends itself to indexing • Each entry in the XML file can be considered as a separate entity • To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed • Uses a slightly modified Ælfred XML parser • Entries indexed by headword in a separate index file • The system returns an XML document object containing the single dictionary entry, facilitating: • processing for related words (Graph layout) • XSL processing to HTML
Kirrkirr’s XML Index Process Kirrkirr Dictionary Browser XML Parser XML Document Object HTML document + XSL file XSL Processor Index in Memory XML Formatted Warlpiri dictionary file headword file position headword file position headword file position <DICTIONARY> <ENTRY> ... </ENTRY> <ENTRY> ... </ENTRY> <ENTRY> ... </ENTRY> </DICTIONARY> Across file system or web
XDI in Kirrkirr • The XML indexing process considerably improves efficiency as only requested entries are parsed • Parsed entires are kept temporarily in a cache • Thus Kirrkirr uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of text.
XQL - Potential • An alternative to investigate for the future is using a standard query language – such as XQL – to get material out of the XML dictionary, rather than using our ad hoc index. • At the moment not a huge issue since most retrieval is focussed on components of a particular word
XQL - Optimizations • Revamp data structure • reduce redundancy, amount to load at start-up • PDOM (Persistent Document Object Model) • represents XML document as a collection of objects in a tree like model • XQL (Extensible Query Language) • query language for XML • e.g. /DICTIONARY/ENTRY[9] • DICTIONARY/ENTRY[HW='jaja']
Performance - Startup time • Impact on Startup time.
Customised Presentation of Dictionary Content • Produced dynamically from the XML by using XSL (via James Clark’s XT) • XSL allows easy modelling of some user preferences. • This is useful as many users find information overload quite confusing and demotivating • Can produce bilingual or monolingual dictionary • Opportunities for various output styles, and formats such as RTF or TeX for printing.
Performance - XSL Presentation • Creates minimal load on the application • Requires file creation permission for the applet • Takes load off file system (no need for 9000+ pre-generated files) • Gives the user the opportunity to customise the formatting.
User study Mim Corris & Jane Simpson • User testing with Warlpiri children (primary and secondary students), adults and teachers. • Purely qualitative observational study of dictionary use. (Doing anything much else would be difficult) • Teachers using a domain-specific dictionary extract still found the interface more efficient to use for language tasks.
Initial reactions - enthusiastic • Despite teachers concerns that the system would be too hard for children, primary students used the software with relative ease. • Students were given the opportunity to spend ‘free time’ with Kirrkirr • time was spent looking up unfamiliar words from the day before.
Conclusions • While we have focused our research on Warlpiri, the system can be easily applied to other languages • The Key to the effectiveness of the browsing interfaces is that the user has the ability to customise their functionality due to the flexibility of the XML & Kirrkirr technology • Throughout this research, the educational interests of the user have been the highest priority. • Hope to better understand the usefulness & practicality of innovative dictionary browsing environments.
Links • Kirrkirr homepage:www.sultry.arts.usyd.edu.au/kirrkirr • Kevin’s Thesis Homepage: www.sultry.arts.usyd.edu.au/kjansz/thesis
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary Kevin Janszkjansz@sultry.arts.usyd.edu.au Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA Nitin Indurkhya School of Applied Science, Nanyang Technological University, Singapore