110 likes | 252 Views
Representing Temporal Information in Cultural Historical Databases. 19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi) University of Jyväskylä. Outline. Motivation Representing temporal information in Duo & Arte Problems with current representation
E N D
Representing Temporal Information in Cultural Historical Databases 19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi) University of Jyväskylä
Outline • Motivation • Representing temporal information in Duo & Arte • Problems with current representation • Towards a generic model for representing uncertain temporal information in a relational database • Alternative approaches • Conclusion
Motivation • Culture historical information provides a rich and challenging domain for data management, both from temporal and general perspective • A multitude of complex metadata can be attached to a given object • A combination of “well-formed” (relatively static, precise, well-known) and ambiguous (uncertain, imprecise) information • Standards for representing the information exist (CIDOC-CRM, MuseoSuomi, etc), but in practice the field is scattered – the databases used in museums are not interoperable in general • In small museums, paper may still be used for cataloging (and even in museums that have a computer system – as a backup) • From a time ontology perspective, flexible, expressive, and easy-to-use –structures that allow incomplete and imprecise information but still support querying are needed
Collection management systems in JYU Museum • JYU Museum uses two database-based client/server applications for collection management and museology student projects: • DUO photographs, recordings, books and other objects (in use since 2003, includes ~28000 items) • ARTE for works of art (in use since 2006, includes ~1000 items) • Other applications related to collection management (e.g. image processing, web publishing) are also used, next-generation systems (e.g. IDA) are in development • DUO & ARTE use separate databases, but share most of the code in reusable components (DB management, GUI components, search engine). • The databases have parts with identical or nearly similar structures (e.g. persons, exhibitions, temporal information) http://sovellusprojektit.it.jyu.fi/tare/dokumentit/kayttoohje/kayttoohje.html http://users.jyu.fi/~minurmin/duo/
Representation of temporal information in Duo & Arte • Standard database DATE type is used for some ”certain” dates (e.g. logging and modification information exhibition dates, check out dates) • A custom tblAika table is used for most of the collection metadata –related information. • Depending on the metadata field, user can see only years and interval marks. For more specific fields, days and months can be edited as well. • Any field can be left empy • Interval mark introduces a number of conventions that can not be easily be reflected in searches (e.g. ”-”:normal, ”n”:about, ”-luku”:decade, etc. If the mark is left empty, only the beginning date should count. • In practice, the potential semantics in interval mark is not accounted for in queries
Querying temporal information in Duo/Arte • For precise information (dates expressed in DATE datatype), exact match, or a given upper/lower bound can be used • For imprecise information, three search options are provided • Unbounded interval: matches if either (or both) end of the interval is within the query, includes 0-years in interval • End points: like unbounded interval, but a nonzero value must match the query (i.e. does not include 0-years) • Bounded interval: matches if both ends of the interval are within the query • In result list, start date, inverval mark, and end date are compressed to one field. For technical reasons, unspecified (interpreted as unbounded) years are shown as zeroes.
Problems in current approach - -- - - - ?? - ??- - 1960 - l - l n.- - lahj - n / /huhti /kevät ? -? -> 0 1937 -1998 2001 2002 7 alk -alku -alkup elokuu ennen helmikuu huhtikuu -jälkeen joulukuu kesä kevät kevätlk -l -l ? -l alku -l ap -l n ? -l n. -l vaihde -l.alk -l.lop -l? -loppu -luku -luku- -luku? -lukujen v -luvulta -luvulta? -luvun lop -luvut maaliskuu marraskuu n n- n asti n? noin -noin noin-? syksy syyslukuka tammikuu -vaihde • Despite a few clean-up attemps, semantics in interval marks (and the words used) are not easily controlled • Same query cannot be used for both precise (DATE) and imprecise (tblAika) data fields • No standard convention enforced to present ”points” in time in db table –based approach. By convention, a start date no interval mark can be interpreted as a point. However, this has not been used consistently. • Definitions and user interface for different types of temporal queries is not intuitive to end users • Although the time representation in db is of general-purpose, it does not support a lifecycle-based approach for object documentation (i.e. time information is ”hard-wired” to objects to specific metadata fields, but cannot be used in an extensive way with user-defined roles like CIDOC)
Requirements for a new temporal model • The goal is to generalize the temporal model such that both precise and imprecise information including both points and intervals are accounted for in same structure • Could utilize ideas from time ontologies (e.g. query operators), but the representation should be physically in relational database. • Ease of integration to existing applications – minimize 3rd-party component usage to keep the application as self-contained and easy to install as possible • Performance – temporal information is used in almost all end-user specified queries and reports • Object lifecycle could be utilized in time information using a new, extensible role table that includes information about the metadata field used • Similar approach is already used with manufacturer roles (i.e. a person manufacturing an ”item” in DUO database can be photographer, artist, director, writer, etc) • Eases integration with CIDOC-CRM metadata • User interface issues (e.g. visual component for temporal queries?) tblKappale
Alternative approaches • Integration with domain specific ontologies (CIDOC, MuseoSuomi, etc) • Each object in DUO database should have at least partial representation in another datastore • Semantic annotation of collection items is time-consuming and even if only temporal information and ID codes were transferred, system becomes essentially more complex • Utilization of general-purpose time-based ontologies (OWL Time, etc) • Requires integrating new software components to application (e.g. RDF database frontend (Jena), inference engine (Pellet), transformation and updating existing data • Highly sophisticated approach and ideal for research (especially in semantic web track), but even more complex than CIDOC approach • Most of the information in time ontology might not be needed in this particular application • RDF databases and query languages are not yet as mature and stable technology as relational databases • Utilizing a different computational model for time representation • Fuzzy logic or probabilistic models might be effective for representing uncerstain temporal information – work well with general uncertain data anyway • Might end up as relatively simple model in theory, but customized processing is needed to specify and represent the time information
Conclusion • Culture historical information provides a rich and challenging domain for data management, both from temporal and general perspective • Collection managemenent systems in JYU Museum were introduced and problems with representing and retrieving temporal information were identified • A new temporal model accounting different representations, uncertainty, and object lifecycle was roughly sketched • The new model should be applied directly in relational database. Alternative, non-db approaches were evaluated but were considered too complex or immature to be used in production environment • The model must be specified in more detail along with potential user interface in cooperation with end users • Transformation from production database should be carefully planned