1 / 11

19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi)

Representing Temporal Information in Cultural Historical Databases. 19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi) University of Jyväskylä. Outline. Motivation Representing temporal information in Duo & Arte Problems with current representation

ayita
Download Presentation

19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing Temporal Information in Cultural Historical Databases 19.5.2009 TIES444 Software Engineering Seminar Miika Nurminen (minurmin@jyu.fi) University of Jyväskylä

  2. Outline • Motivation • Representing temporal information in Duo & Arte • Problems with current representation • Towards a generic model for representing uncertain temporal information in a relational database • Alternative approaches • Conclusion

  3. Motivation • Culture historical information provides a rich and challenging domain for data management, both from temporal and general perspective • A multitude of complex metadata can be attached to a given object • A combination of “well-formed” (relatively static, precise, well-known) and ambiguous (uncertain, imprecise) information • Standards for representing the information exist (CIDOC-CRM, MuseoSuomi, etc), but in practice the field is scattered – the databases used in museums are not interoperable in general • In small museums, paper may still be used for cataloging (and even in museums that have a computer system – as a backup) • From a time ontology perspective, flexible, expressive, and easy-to-use –structures that allow incomplete and imprecise information but still support querying are needed

  4. Collection management systems in JYU Museum • JYU Museum uses two database-based client/server applications for collection management and museology student projects: • DUO photographs, recordings, books and other objects (in use since 2003, includes ~28000 items) • ARTE for works of art (in use since 2006, includes ~1000 items) • Other applications related to collection management (e.g. image processing, web publishing) are also used, next-generation systems (e.g. IDA) are in development • DUO & ARTE use separate databases, but share most of the code in reusable components (DB management, GUI components, search engine). • The databases have parts with identical or nearly similar structures (e.g. persons, exhibitions, temporal information) http://sovellusprojektit.it.jyu.fi/tare/dokumentit/kayttoohje/kayttoohje.html http://users.jyu.fi/~minurmin/duo/

  5. Representation of temporal information in Duo & Arte • Standard database DATE type is used for some ”certain” dates (e.g. logging and modification information exhibition dates, check out dates) • A custom tblAika table is used for most of the collection metadata –related information. • Depending on the metadata field, user can see only years and interval marks. For more specific fields, days and months can be edited as well. • Any field can be left empy • Interval mark introduces a number of conventions that can not be easily be reflected in searches (e.g. ”-”:normal, ”n”:about, ”-luku”:decade, etc. If the mark is left empty, only the beginning date should count. • In practice, the potential semantics in interval mark is not accounted for in queries

  6. DUO example form with time interval

  7. Querying temporal information in Duo/Arte • For precise information (dates expressed in DATE datatype), exact match, or a given upper/lower bound can be used • For imprecise information, three search options are provided • Unbounded interval: matches if either (or both) end of the interval is within the query, includes 0-years in interval • End points: like unbounded interval, but a nonzero value must match the query (i.e. does not include 0-years) • Bounded interval: matches if both ends of the interval are within the query • In result list, start date, inverval mark, and end date are compressed to one field. For technical reasons, unspecified (interpreted as unbounded) years are shown as zeroes.

  8. Problems in current approach - -- - - - ?? - ??- - 1960 - l - l n.- - lahj - n / /huhti /kevät ? -? -> 0 1937 -1998 2001 2002 7 alk -alku -alkup elokuu ennen helmikuu huhtikuu -jälkeen joulukuu kesä kevät kevätlk -l -l ? -l alku -l ap -l n ? -l n. -l vaihde -l.alk -l.lop -l? -loppu -luku -luku- -luku? -lukujen v -luvulta -luvulta? -luvun lop -luvut maaliskuu marraskuu n n- n asti n? noin -noin noin-? syksy syyslukuka tammikuu -vaihde • Despite a few clean-up attemps, semantics in interval marks (and the words used) are not easily controlled • Same query cannot be used for both precise (DATE) and imprecise (tblAika) data fields • No standard convention enforced to present ”points” in time in db table –based approach. By convention, a start date no interval mark can be interpreted as a point. However, this has not been used consistently. • Definitions and user interface for different types of temporal queries is not intuitive to end users • Although the time representation in db is of general-purpose, it does not support a lifecycle-based approach for object documentation (i.e. time information is ”hard-wired” to objects to specific metadata fields, but cannot be used in an extensive way with user-defined roles like CIDOC)

  9. Requirements for a new temporal model • The goal is to generalize the temporal model such that both precise and imprecise information including both points and intervals are accounted for in same structure • Could utilize ideas from time ontologies (e.g. query operators), but the representation should be physically in relational database. • Ease of integration to existing applications – minimize 3rd-party component usage to keep the application as self-contained and easy to install as possible • Performance – temporal information is used in almost all end-user specified queries and reports • Object lifecycle could be utilized in time information using a new, extensible role table that includes information about the metadata field used • Similar approach is already used with manufacturer roles (i.e. a person manufacturing an ”item” in DUO database can be photographer, artist, director, writer, etc) • Eases integration with CIDOC-CRM metadata • User interface issues (e.g. visual component for temporal queries?) tblKappale

  10. Alternative approaches • Integration with domain specific ontologies (CIDOC, MuseoSuomi, etc) • Each object in DUO database should have at least partial representation in another datastore • Semantic annotation of collection items is time-consuming and even if only temporal information and ID codes were transferred, system becomes essentially more complex • Utilization of general-purpose time-based ontologies (OWL Time, etc) • Requires integrating new software components to application (e.g. RDF database frontend (Jena), inference engine (Pellet), transformation and updating existing data • Highly sophisticated approach and ideal for research (especially in semantic web track), but even more complex than CIDOC approach • Most of the information in time ontology might not be needed in this particular application • RDF databases and query languages are not yet as mature and stable technology as relational databases • Utilizing a different computational model for time representation • Fuzzy logic or probabilistic models might be effective for representing uncerstain temporal information – work well with general uncertain data anyway • Might end up as relatively simple model in theory, but customized processing is needed to specify and represent the time information

  11. Conclusion • Culture historical information provides a rich and challenging domain for data management, both from temporal and general perspective • Collection managemenent systems in JYU Museum were introduced and problems with representing and retrieving temporal information were identified • A new temporal model accounting different representations, uncertainty, and object lifecycle was roughly sketched • The new model should be applied directly in relational database. Alternative, non-db approaches were evaluated but were considered too complex or immature to be used in production environment • The model must be specified in more detail along with potential user interface in cooperation with end users • Transformation from production database should be carefully planned

More Related