330 likes | 439 Views
Une approche basée sur la langue naturelle pour la modélisation de documents structurés. Yves MARCOUX GRDS – EBSI Université de Montréal. A natural-language approach to modeling. Why is some XML so difficult to write?
E N D
Une approche basée sur la langue naturelle pour la modélisation de documents structurés Yves MARCOUX GRDS – EBSI Université de Montréal Yves Marcoux - OLST-RALI - 21 mars 2007
A natural-language approach to modeling Why is some XML so difficult to write? <http://www.idealliance.org/papers/extreme/proceedings/html/2006/Marcoux01/EML2006Marcoux01.html> Yves Marcoux - OLST-RALI - 21 mars 2007
Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007
Writing well-formed XML: author’s choices • <sex><male /></sex> • <is-female>FALSE</is-female> • <gender gender="♂" /> • <note>It's a boy!</note> ♂ = ♂ Yves Marcoux - OLST-RALI - 21 mars 2007
Writing valid XML is collaborative work • Modeler has chosen the markup (container) • Author supplies the contents • Much like a form • Collaborative work communication between parties: modeler and author • But the modeler is gone… Yves Marcoux - OLST-RALI - 21 mars 2007
Problem • Authoring environments are: • good at conveying the syntactic intentions (or decisions) of the modeler • not as good at conveying the semantic intentions of the modeler • Often, all there is is a generic ID or some slightly more developed form • Ex.: “date” in a memo Yves Marcoux - OLST-RALI - 21 mars 2007
What is available? • More or less developed forms of genIDs (and attribute names) • General documentation of the model • Per element (attribute) documentation • OK for tooltips or popups • Could we do better? • (Applications / stylesheets are not appropriate) Yves Marcoux - OLST-RALI - 21 mars 2007
Could we aim at… • Having a semantic conversation right in the editing window? • In the same way that there is actually a syntactic conversation? • Yes… Yves Marcoux - OLST-RALI - 21 mars 2007
Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007
Key idea • Have modeler prepare bits of NL (prose) • That can be intertwined with author-supplied contents to give them meaning • Allows “fill-in”-like sentences • And thus, a semantic conversation in the editing window • NB: modeler segments can contain hyperlinks Yves Marcoux - OLST-RALI - 21 mars 2007
Example Facts about some US cities Yves Marcoux - OLST-RALI - 21 mars 2007
Raw XML <facts-about-US-cities> <city> <name>Denver</name> <population>850,000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <city> <name>Rochester</name> <population>240,000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city> ... </facts-about-US-cities> Yves Marcoux - OLST-RALI - 21 mars 2007
Prose equivalent Here are facts about some US cities. The city of Denver has a population of 850,000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48,000 and an annual snowfall of 0 inches. Yves Marcoux - OLST-RALI - 21 mars 2007
Modeler prepares “peritext” segments Yves Marcoux - OLST-RALI - 21 mars 2007
Possible “semantic” view Here are facts about some US cities. The city namedDenverhas a population of850,000and an annual snowfall of23inches. The city namedRochesterhas a population of240,000and an annual snowfall of88inches. The city namedPalm Spring has a population of48,000and an annual snowfall of0inches. Yves Marcoux - OLST-RALI - 21 mars 2007
What it allows during editing (in semantic view) • Peritexts convey the semantic intentions of the modeler • A semantic conversation takes place in the editing window (instead of a syntactic one) • Fill-in sentences: • Make “tag abuse” embarrassing… • Likely to reduce some kinds of errors • Other views / fragment viewing / hyperlink Yves Marcoux - OLST-RALI - 21 mars 2007
Discussion • This is not like defining an application • Not a stylesheet mechanism • Peritexts (fixed here) could be allowed to vary with some parameters: • position among siblings • attribute value • etc. • (Attributes should be treated) Yves Marcoux - OLST-RALI - 21 mars 2007
Why does it work? • Sometimes tricky (see paper), but… • NL has very high affordance • NL can act as it’s own metalanguage • XML contents + NL usually mix pretty well Yves Marcoux - OLST-RALI - 21 mars 2007
Intertextual semantics • Meaning of a text fragment is given by placing it in a network of other texts • That network can simply consist in a sentence (or “quasi-sentence”) • Or more elaborate topology: peritexts can contain hyperlinks, determining sense-making / learning paths • Too much hyperlinking can spoil the idea! Yves Marcoux - OLST-RALI - 21 mars 2007
Interpretation workflow H S dS(d) actual “meaning” of d for H • d is document or fragment, H is a human • S(d) is the intertextual semantics of d • S(d) is in NL • S is machine computable • Actual meaning of d for H may vary: • with H • for a same H, from one “reading” of S(d) to another Yves Marcoux - OLST-RALI - 21 mars 2007
Interpretation workflow H2 H1 d H3 H1 d S(d) H2 H3 Yves Marcoux - OLST-RALI - 21 mars 2007
Suggests a modeling process • Modeler starts with the prose • Identify peritexts • Work out more and more abbreviated forms • Will correspond to different “views” in the editor • Tersest level gives markup • Increase model usability? Yves Marcoux - OLST-RALI - 21 mars 2007
Mixed content question revisited • Known: can get rid of mixed content with <!ELEMENT text (#PCDATA)> Example: <!ELEMENT (e1 | e2 | … | #PCDATA)*> becomes: <!ELEMENT (e1 | e2 | … | text)*> • Why does it feel bad? • Tags “text” are not abbreviations of any reasonable peritexts! Yves Marcoux - OLST-RALI - 21 mars 2007
Is NL too much to ask for? • Relative to some “target” community • Can go a long way (previous slide) • Hyperlinks are allowed in peritexts • Allows defining “sense-making” or learning paths • (Almost) anything formal can be turned into NL… Yves Marcoux - OLST-RALI - 21 mars 2007
NL as formalism common denominator Expression in artificial formalism STAPLER Textbook explaining formalism Equivalent expression in NL Yves Marcoux - OLST-RALI - 21 mars 2007
Editing setup without intertextual semantics World Modeler NL and presupposedknowledge of target community Doc. / tr. material Author XML EDITOR Valid XML instance or fragment XML DTD Yves Marcoux - OLST-RALI - 21 mars 2007
Editing setup with intertextual semantics World Modeler NL and presupposedknowledge of target community Author XML EDITOR NL equivalent Valid XML instance or fragment text-before and text-after segments XML DTD Yves Marcoux - OLST-RALI - 21 mars 2007
Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007
What it suggests • Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design • E.g., don’t abuse hyperlinking • Litterate modeling, litterate interfaces • Litterate interface / interaction design • Benefit: make explicit prerequisite knowledge & sense-making / learning paths Yves Marcoux - OLST-RALI - 21 mars 2007
Other possible uses of intertextual semantics • Legal documents with multiple renditions • NLP systems that cannot treat markup • Including full-text indexing • <ex>Hamlet</ex> • “Exit Hamlet” • Other data models • Ex.: relational • Normal forms • A new look at expressivity Yves Marcoux - OLST-RALI - 21 mars 2007
Future work • Editing: • Work out a few existing / new models • Properly integrate attributes • More powerful peritext computation • Implement ideas in a real editor • Display peritexts when chosing insertion • Hyperlinks in displayed peritexts • Experiment with real authors Yves Marcoux - OLST-RALI - 21 mars 2007
Future work • More than peritexts? • More than NL (icons, sound, …)? • Compare with other semantic frameworks • Downstream semantics: Wrightson, Renearet al. • Other models • Tackle litterate modeling / interface design Yves Marcoux - OLST-RALI - 21 mars 2007
Merci! Questions? Yves Marcoux - OLST-RALI - 21 mars 2007