270 likes | 385 Views
An interactive environment for creating and validating syntactic rules. Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language and Speech Processing (ILSP) {pbour, afotop, nglaros}@ilsp.gr
E N D
An interactive environment for creating and validating syntactic rules Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language and Speech Processing (ILSP) {pbour, afotop, nglaros}@ilsp.gr * Current affiliation is National and Kapodistrian University of Athens, Department. of Informatics and Telecommunications
Outline • Introduction • Motivation • Architecture • Working Environment • Functionality • Real-World scenario • Conclusion
Introduction (1) • Checking human free text challenge • Word-by-word approach • Efficient for automatic check of spelling errors • Prominent in languages with poor morphology • Phrase-by-phrase approach • No misspelled words but still incorrect syntax, e.g. “I listens to the music.” • Rule based syntactic analysis • Highly inflectional languages, e.g. Greek • Need for advanced spelling checkers
Introduction (2) • Building advanced spelling checkers • Statistical approaches • N-grams • Smoothing techniques • Syntactic analysis framework • Morphological lexicon • Set of syntactic rules
Motivation • Focus on syntactic analysis • Support of ILSP’s advanced spelling checker (Symfonia) • Interactive environment: • User-friendly for language specialists – no need for computer programming knowledge • Enables user to easily create, edit, view and test syntactic rules • Graphical tree representation • XML storing mechanism – targeted speller independent • Ready-to-execute targeted speller code • Supports monitoring and validation of syntactic rules application and interaction • Text corpora • Check all or a subset of syntactic rules • Identification and handling of possible conflicts • Generation of detailed reports with rich monitoring information
Architecture (1) • Graphical Rule Creator • Rule Handler • Rules Kernel • Lexicon • Rules Kernel Monitor
Working Environment Edit rule Remove rule Disable rule Enable rule Export rule Export Rules Kernel Create rule Monitor procedure Rules integrated into Rules Kernel Rule status
Create rule • Focus on LexiX • Specify properties • Specify rule context using tree representation • LexiX valid grammatical characterizations • Specify lexi i.e. grammatical characteristics of a word • Rule result • Restriction in specific words • Inheritance of grammatical characteristics from adjacent words • Alternative rule environments • Set of lexis
Edit rule • Similar to rule’s creation • XML rule file parsing -> filled tree representation • User modifications on: • Rule properties • Rule context
More functionalities • Remove rule • Disable/Enable rule • Export rule • To high level programming language • E-mail to targeted syntactic speller programmers • Export Rules Kernel
Monitor procedure (1) • Generation and selection of rules • Optimized performance of spelling check engine • Consistent set of rules • Need to check one or more rules against the others • Identify and minimize possible conflicts and insufficiencies
Monitor procedure (2) • Two kinds of checking • Interactive • when a spelling error occurs, the user picks one of the automatically generated spelling suggestions • Automatic • the system picks the first in the list of spelling suggestions by default • But first of all • Specify input text • Set of rules syntactic rules • Report • Document with erroneous sentences
Real-World scenario • Solve ambiguity “πιο” (more) – “ποιο” (which) • Same phonetic transcription /pjo/ • Different grammatical category, adverb – pronoun • Two syntactic rules need • Decision “πιο” • Decision “ποιο”
Real-World scenario • Rule environment: • Lexi1 LexiX Lexi2 • LexiX characterized by ambiguity “πιο” – “ποιο” • Lexi1 article • Lexi2 either an adjective or a noun or an adverb • Then Lexi1 adverb – “πιο”
Real-World scenario • Rule environment: • LexiX Lexi1 Lexi2 Lexi3 Lexi4 Lexi5 • Some or all of Lexi1, Lexi2, Lexi3, Lexi4 maybe missing • LexiX characterized by ambiguity “πιο” – “ποιο” • Lexi1 article • Lexi2 adjective • Lexi3 noun • Lexi4 particle • Lexi5 verb • Then LexiX pronoun – “ποιο”
Conclusion • Focus on syntactic analysis • Support of ILSP’s advanced spelling checker (Symfonia) • Interactive user-friendly environment for • Fast generation of syntactic rules • Create, edit, view • Real time monitoring and validation of their application in existing text corpora