250 likes | 345 Views
ESCRIRE: Embedded Structured Content Representation In Repositories. Jérôme Euzenat INRIA Rhône-Alpes Jerome.Euzenat@inrialpes.fr. ESCRIRE: Motivations. Embedding a simplified but formal representation of content in documents : • search on structured criteria;
E N D
ESCRIRE:Embedded Structured ContentRepresentation In Repositories Jérôme Euzenat INRIA Rhône-Alpes Jerome.Euzenat@inrialpes.fr
ESCRIRE: Motivations • Embedding a simplified but formal representation of content in documents : • • search on structured criteria; • • document comparison (genericity, similarity…); • • automatic classification and organisation.
Knowledge based queries • (and book (about "Agatha Christie")) • vs. book AND "Agatha Christie" • (and flat (location "Alps")) • …including those in Val d’Isère! • (and bookshop (location "London")) • …bookstore included.
Query languages • level 3 Semiotic • level 2 Semantic (F-logic, Escrire…) • level 1 Structural (SQL, XQL) • level 0 Full-text search
ESCRIRE: Goals • Comparison of several knowledge representation techniques • in order to find the type of situation to which they are most suited (indexing, classifying, filtering…).
ESCRIRE: Consortium • “Coordinated research action (ARC)” involving • Acacia (Sophia-Antipolis): conceptual graphs • Sherpa/Exmo (Rhône-Alpes): object-based representations • Orpailleur (Lorraine): terminological logics. • Usinor: application.
ESCRIRE: Acquisition Tr-schema “Ontology” Global analysis XML document Integration Description Tr-object Individual analysis Document
ESCRIRE: Queries Tr-schema “Ontology” XML document Tr-query Troeps Query helper XML document
ESCRIRE: Problem statement Given: A set of (HTML) documents annotated by a description of their content in a pivotal langage An ontology of the domain A set of queries about the subject. Retrieve: the adequate documents.
ESCRIRE: Software variation • Knowledge representation + query evaluation • Translated from a pivotal language in • Conceptual graphs, Object-based representation, Description logic • Translated by hand in CG, OKR, DL
ESCRIRE: Quantitative criteria • • Precision: rate of correct answers • • Recall: rate of complete answers • • Acuracy=(precision+recall)/2 • • Performances in time • • Coverage of the query language • • Ordering of answers
ESCRIRE: Qualitative criteria • Given by external users (query designers): • • Naturalness of queries • • Adequacy of answers • • Overall appreciation (aggregation).
ESCRIRE: Scaling • Multiplying the size by orders of magnitude: • • Corpus • • Ontology • • Queries.
ESCRIRE: Reference comparisons • • Dublin core metadata • • Full-text search
ESCRIRE: Ontology elements (1) • <esc:ontology> • <esc:defclass name="gene"> • <esc:classref name="adn-part"/> • <esc:defattribute name="length"> • <esc:typeref name="integer"/> • </esc:defattribute> • <esc:defattribute name="protein"> • <esc:classref name="protein"/> • </esc:defattribute> • </esc:defclass> • …
ESCRIRE: Ontology elements (2) • <esc:descrelation name="interaction"> • <esc:relref name="bio-process"/> • <esc:defattribute name="effect"> • <esc:typeref name="string"/> • </esc:defattribute>… • <esc:defrole name="promoter"> • <esc:classref name="gene"/> • </esc:defrole>… • </esc:descrelation>… • </esc:ontology>
ESCRIRE: Content descriptions • <esc:content ontology="biointer.xml" url="."> • <esc:object type="gene" id="bcd"/> • <esc:relation type="interaction"> • <esc:attribute name="effect"> • inhibition • </esc:attribute> • <esc:role name="promoter"> • <esc:objref id="Bcd"/> • </esc:role> • </esc:relation>… • </esc:content>
ESCRIRE: Knowledge embedding • <html>… <!-- xhtml --> • <rdf:RDF> • <rdf:Description about="/"> • <!-- dublin core --> • <dc:title>…</dc:title>… • <!-- pivot language --> • <esc:content>… </esc:content> • <!-- conceptual graphs --> • <gc:graphs>…</gc:graphs> • … • </rdf:Description>… • </rdf:RDF>… • </html>
ESCRIRE: Queries • • Stated on objects, but results are documents • (concerning these topics) • • Document similarity by content similarity
ESCRIRE: Query language • SELECT / FROM / WHERE / ORDERBY • + • AND / OR / NOT / ALL / EXISTS • <path> <relop> <path>|<value> • IN <class> • ALIKE <document>
ESCRIRE: Corpus 1 • Subject: genetic interaction • Text source: MedLine abstracts • Annotations: manual • Ontology: Knife knowledge base + other
ESCRIRE: Corpus 2 • Subject: Psychological stress • Text source: MedLine abstracts • Annotation: manual annotations • Ontology: UMLS/MeSH
ESCRIRE: Where are we? • • Building translators from pivot to actual formats • • 1st part of Corpus 1 available (other data shall folow quikly)
ESCRIRE: Calls • • Other corpora • • Natural language technology • • Other representation systems • starting from september 2000
For more information… • http://escrire.inrialpes.fr/ • Jerome.Euzenat@inrialpes.fr