120 likes | 189 Views
MESMUSES methodology. Lessons learned and open issues… Alain Michard Florence, June 2003. MESMUSES broad vision. Just like several other projects SW is all about semantic interoperability Sharing machine-readable terminologies and classification schemes
E N D
MESMUSES methodology Lessons learned and open issues… Alain Michard Florence, June 2003
MESMUSES broad vision • Just like several other projects • SW is all about semantic interoperability • Sharing machine-readable terminologies and classification schemes • Science and culture are collective and international • Semantic Web methodology should be highly relevant for managing and sharing scientific and cultural information
Some key S&T issues in the Project • Model : is RDFS / OWL-Lite adequate ? • Schemaauthoring : method and tools needed ! • Metadata : where does it come from ? • Automatic Indexing : experiments with a categorizer
Lives-in Produces Dwelling Person Artefact Owner Schema House Artwork Artist Create Surrogates Lives-in Creates The basic SW model Type : texte imprimé, monographie Auteur(s) : Zola, Émile (1840-1902) Titre(s) : L'assommoir [Texte imprimé] / par Emile Zola Edition : 50e éd. Publication : Paris : G. Charpentier, 1878 Description matérielle : 111-569 p. Notice n° : FRBNF35963044 Real-world entities
Model and Schema Language • Typed attributes are needed • XML-Schema types • Derived types (e.g.: Celsius temperature, Gregorian date, etc.) • Enumerated types, thesauri • Time-stamping • Cardinality constraints • Explicit transitivity of properties (e.g.: geographic inclusion)
Schema authoring issues (1) • Find the right level of abstraction • Is « Glucid » a class or an instance ? • Or is it sometime a class and sometime an instance ? • Avoid the « KR » attitude and practices ! • It’s all about indexing resources with shared terminologies, not about representing human knowledge !
est-constitué-de ISA consomme ISA transforme est-régulé-par est-constitué-de produit Processus Système implique élimine Structure déclenche Processus complexe Processus élémentaire nécessite ISA est-réalisé-par est-documentée-par est-documentée-par Organisme Cellule Appareil Organe Molécule Grande Thématique GTANS est-expliquée-par Tissus Schema authoring issues (2)
Schema authoring issues (4) • Authoring tools are badly needed • Graphical representation of the schema • Zooming on sub-graphs (hierarchies) • Versioning • Consider using UML authoring environment ? • Established methodology and tutorials are needed
Creating Surrogates • Data extraction and fusion from structured sources • R-DB, XML-DB, LDAP • Updating • When ? • Should not create duplicates ! • Detect cross-references • Authority lists • Thesauri • Lexical distance • ???
Automatic Categorization • Automatic indexing • By extracting metadata from resources • By automatic categorization • Define hierarchies of « concepts » inside the schema • Seeding with representative documents • Machine learning to create categorizers • Pros : enriched search functionality • Cons : hierarchies of categories are static • Adding a category may change the categorizers of the others
Bottom-line… • RDFS schema authoring may be more difficult than E-R modelling • Debates on syntactic features are irrelevant • Should be grounded on real-world implementations and testbeds • A new query language (e.g.: RQL) is not high priority • We have not addressed the « logical rules » layer • Semantic Web vs. Community Webs