1 / 33

Une approche basée sur la langue naturelle pour la modélisation de documents structurés

Une approche basée sur la langue naturelle pour la modélisation de documents structurés. Yves MARCOUX GRDS – EBSI Université de Montréal. A natural-language approach to modeling. Why is some XML so difficult to write?

guang
Download Presentation

Une approche basée sur la langue naturelle pour la modélisation de documents structurés

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Une approche basée sur la langue naturelle pour la modélisation de documents structurés Yves MARCOUX GRDS – EBSI Université de Montréal Yves Marcoux - OLST-RALI - 21 mars 2007

  2. A natural-language approach to modeling Why is some XML so difficult to write? <http://www.idealliance.org/papers/extreme/proceedings/html/2006/Marcoux01/EML2006Marcoux01.html> Yves Marcoux - OLST-RALI - 21 mars 2007

  3. Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007

  4. Writing well-formed XML: author’s choices • <sex><male /></sex> • <is-female>FALSE</is-female> • <gender gender="&#x2642;" /> • <note>It's a boy!</note> &#x2642; = ♂ Yves Marcoux - OLST-RALI - 21 mars 2007

  5. Writing valid XML is collaborative work • Modeler has chosen the markup (container) • Author supplies the contents • Much like a form • Collaborative work  communication between parties: modeler and author • But the modeler is gone… Yves Marcoux - OLST-RALI - 21 mars 2007

  6. Problem • Authoring environments are: • good at conveying the syntactic intentions (or decisions) of the modeler • not as good at conveying the semantic intentions of the modeler • Often, all there is is a generic ID or some slightly more developed form • Ex.: “date” in a memo Yves Marcoux - OLST-RALI - 21 mars 2007

  7. What is available? • More or less developed forms of genIDs (and attribute names) • General documentation of the model • Per element (attribute) documentation • OK for tooltips or popups • Could we do better? • (Applications / stylesheets are not appropriate) Yves Marcoux - OLST-RALI - 21 mars 2007

  8. Could we aim at… • Having a semantic conversation right in the editing window? • In the same way that there is actually a syntactic conversation? • Yes… Yves Marcoux - OLST-RALI - 21 mars 2007

  9. Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007

  10. Key idea • Have modeler prepare bits of NL (prose) • That can be intertwined with author-supplied contents to give them meaning • Allows “fill-in”-like sentences • And thus, a semantic conversation in the editing window • NB: modeler segments can contain hyperlinks Yves Marcoux - OLST-RALI - 21 mars 2007

  11. Example Facts about some US cities Yves Marcoux - OLST-RALI - 21 mars 2007

  12. Raw XML <facts-about-US-cities> <city> <name>Denver</name> <population>850,000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <city> <name>Rochester</name> <population>240,000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city> ... </facts-about-US-cities> Yves Marcoux - OLST-RALI - 21 mars 2007

  13. Prose equivalent Here are facts about some US cities. The city of Denver has a population of 850,000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48,000 and an annual snowfall of 0 inches. Yves Marcoux - OLST-RALI - 21 mars 2007

  14. Modeler prepares “peritext” segments Yves Marcoux - OLST-RALI - 21 mars 2007

  15. Possible “semantic” view Here are facts about some US cities. The city namedDenverhas a population of850,000and an annual snowfall of23inches. The city namedRochesterhas a population of240,000and an annual snowfall of88inches. The city namedPalm Spring has a population of48,000and an annual snowfall of0inches. Yves Marcoux - OLST-RALI - 21 mars 2007

  16. What it allows during editing (in semantic view) • Peritexts convey the semantic intentions of the modeler • A semantic conversation takes place in the editing window (instead of a syntactic one) • Fill-in sentences: • Make “tag abuse” embarrassing… • Likely to reduce some kinds of errors • Other views / fragment viewing / hyperlink Yves Marcoux - OLST-RALI - 21 mars 2007

  17. Discussion • This is not like defining an application • Not a stylesheet mechanism • Peritexts (fixed here) could be allowed to vary with some parameters: • position among siblings • attribute value • etc. • (Attributes should be treated) Yves Marcoux - OLST-RALI - 21 mars 2007

  18. Why does it work? • Sometimes tricky (see paper), but… • NL has very high affordance • NL can act as it’s own metalanguage • XML contents + NL usually mix pretty well Yves Marcoux - OLST-RALI - 21 mars 2007

  19. Intertextual semantics • Meaning of a text fragment is given by placing it in a network of other texts • That network can simply consist in a sentence (or “quasi-sentence”) • Or more elaborate topology: peritexts can contain hyperlinks, determining sense-making / learning paths • Too much hyperlinking can spoil the idea! Yves Marcoux - OLST-RALI - 21 mars 2007

  20. Interpretation workflow H S dS(d) actual “meaning” of d for H • d is document or fragment, H is a human • S(d) is the intertextual semantics of d • S(d) is in NL • S is machine computable • Actual meaning of d for H may vary: • with H • for a same H, from one “reading” of S(d) to another Yves Marcoux - OLST-RALI - 21 mars 2007

  21. Interpretation workflow H2 H1 d H3 H1 d S(d) H2 H3 Yves Marcoux - OLST-RALI - 21 mars 2007

  22. Suggests a modeling process • Modeler starts with the prose • Identify peritexts • Work out more and more abbreviated forms • Will correspond to different “views” in the editor • Tersest level gives markup • Increase model usability? Yves Marcoux - OLST-RALI - 21 mars 2007

  23. Mixed content question revisited • Known: can get rid of mixed content with <!ELEMENT text (#PCDATA)> Example: <!ELEMENT (e1 | e2 | … | #PCDATA)*> becomes: <!ELEMENT (e1 | e2 | … | text)*> • Why does it feel bad? • Tags “text” are not abbreviations of any reasonable peritexts! Yves Marcoux - OLST-RALI - 21 mars 2007

  24. Is NL too much to ask for? • Relative to some “target” community • Can go a long way (previous slide) • Hyperlinks are allowed in peritexts • Allows defining “sense-making” or learning paths • (Almost) anything formal can be turned into NL… Yves Marcoux - OLST-RALI - 21 mars 2007

  25. NL as formalism common denominator Expression in artificial formalism STAPLER Textbook explaining formalism Equivalent expression in NL Yves Marcoux - OLST-RALI - 21 mars 2007

  26. Editing setup without intertextual semantics World Modeler NL and presupposedknowledge of target community Doc. / tr. material Author XML EDITOR Valid XML instance or fragment XML DTD Yves Marcoux - OLST-RALI - 21 mars 2007

  27. Editing setup with intertextual semantics World Modeler NL and presupposedknowledge of target community Author XML EDITOR NL equivalent Valid XML instance or fragment text-before and text-after segments XML DTD Yves Marcoux - OLST-RALI - 21 mars 2007

  28. Structure of the talk • The problem • Proposed direction for solution • Conclusion • Question period Yves Marcoux - OLST-RALI - 21 mars 2007

  29. What it suggests • Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design • E.g., don’t abuse hyperlinking • Litterate modeling, litterate interfaces • Litterate interface / interaction design • Benefit: make explicit prerequisite knowledge & sense-making / learning paths Yves Marcoux - OLST-RALI - 21 mars 2007

  30. Other possible uses of intertextual semantics • Legal documents with multiple renditions • NLP systems that cannot treat markup • Including full-text indexing • <ex>Hamlet</ex> • “Exit Hamlet” • Other data models • Ex.: relational • Normal forms • A new look at expressivity Yves Marcoux - OLST-RALI - 21 mars 2007

  31. Future work • Editing: • Work out a few existing / new models • Properly integrate attributes • More powerful peritext computation • Implement ideas in a real editor • Display peritexts when chosing insertion • Hyperlinks in displayed peritexts • Experiment with real authors Yves Marcoux - OLST-RALI - 21 mars 2007

  32. Future work • More than peritexts? • More than NL (icons, sound, …)? • Compare with other semantic frameworks • Downstream semantics: Wrightson, Renearet al. • Other models • Tackle litterate modeling / interface design Yves Marcoux - OLST-RALI - 21 mars 2007

  33. Merci! Questions? Yves Marcoux - OLST-RALI - 21 mars 2007

More Related