500 likes | 621 Views
Bela Stantic School of Information and Communication Technology – Griffith University, Australia Institute for Integrated and Intelligent Systems, Griffith University, Australia Visiting Professor - Department of Mathematic and Informatics, Faculty of Science, University of Novi Sad
E N D
Bela Stantic School of Information and Communication Technology – Griffith University, Australia Institute for Integrated and Intelligent Systems, Griffith University, Australia Visiting Professor - Department of Mathematic and Informatics, Faculty of Science, University of Novi Sad Email: B.Stantic@griffith.edu.aiu Institute for Integrated and Intelligent Systems - IIIS
An implicit approach to deal with periodically repeated data Dr Bela Stantic School of Information and Communication Technology – Griffith University, Australia Institute for Integrated and Intelligent Systems, Griffith University, Australia Acknowledgement: Paolo Terenziani - Universita’ del Piemonte Orientale, Alessandria, Italy Abdul Sattar - National Information and Communications Technology, Brisbane , Australia Institute for Integrated and Intelligent Systems - IIIS
Publications • Bela Stantic, Paolo Terenziani, Guido Governatori, AlessioBottrighi, and Abdul Sattar, (2012), ‘ An implicit approach to deal with periodically-repeated medical data’,Journal in Artificial Intelligence in Medicine Volume: 55, doi:10.1016/j.artmed.2012.03.002, Pages: 149 – 162. • Bela Stantic, Paolo Terenziani, Abdul Sattar, AlessioBottrighi and Guido Governatori, 2010, ‘Towards an implicit treatment of periodically-repeated medical data’, in 13th World Congress on Medical and Health Informatics MedInfo, Pages: 1131-1135. • Paolo Terenziani,Bela Stantic,Guido Governatori,Alessio Bottrighi and Abdul Sattar, (2012), ‘An implicit approach for periodic data in relational databases’, in Journal of Intelligent Information Systems: integrating artificial intelligence and database technologies, (under review) Institute for Integrated and Intelligent Systems - IIIS
Overview • Introduction - Periodic data • Related work • Motivation • Implicit vs Explicit approach • TSQL2 data model • Periodic and quasi-periodic granularities • A relational representation of quasi-periodic data • Temporal algebra • Temporal extensions: • Temporal union • Temporal difference • Temporal projection • Temporal Cartesian Project • Algorithm for range queries • Experiment • Results • Conclusion Institute for Integrated and Intelligent Systems - IIIS
Introduction • Periodic events seem to be an intrinsic part of our life, and easier way of perceive reality. • Days repeat at regular periodic patterns, as well as seasons. • Many human activities are scheduled at periodic time (office activities, scheduling airplanes, lessons, medical activities, etc ) • It is widely agreed that adopting a “standard” and fixed menu of granularities (e.g., minutes, hours, days, weeks, years in the Gregorian calendar system) is not enough in order to provide the required expressiveness and flexibility. • Additionally, there are different calendars which depend on the cultural, legal and even purpose of the system. • These calendar systems use user-defined periodic granularities (e.g., the academic, legal, financial year, etc) Institute for Integrated and Intelligent Systems - IIIS
Introduction • The number of repetitions of periodic data may be very large, like in production and in some cases repetitions may also be “open-ended”(e.g., therapies for chronical patients may be repeated all the life long). • In Computer Science literature, in particular, in the areas of Databases, Logics, and Artificial Intelligence, there is a common agreement that formalisms are needed in order to cope with user-defined periodic data in an implicit (also termed intensional) way (i.e., without an explicit storing of all the repetitions). Institute for Integrated and Intelligent Systems - IIIS
Related work • Periodic data play an important role in Databases, so a specific entry has been devoted to such a topic in the new Encyclopedia of Database Systems [Terenziani, 2009]. • In the Encyclopedia three main classes of Database (implicit) approaches to user-defined periodicities have been identified: • Deductive rule-based approaches, using deductive rules (e.g., approaches in classical temporal logics), [Chomicki et al, 1993] • Constraint-based approaches, using mathematical formulae [Kabanza , 1995], • Algebraic (also termed symbolic) approaches, which provide a set of “high-level” and “user-friendly” operators [Bettini et al, 2000, Ning et al, 2002, Terenziani et al, 2003]. • In most approaches in the literature (and, in particular, in symbolic approaches), the focus is on the design of high-level formalisms to model user-defined periodicities in a “commonsense” or at least “user-friendly” way. Institute for Integrated and Intelligent Systems - IIIS
Related work • Despite there seems to be a general agreement within the Database literature that general-purpose implicit approach is needed in order to cope with user-defined periodic data, and despite the fact that a lot of such approaches have been devised in the last two decades (the survey by Tuzhlin and Clifford, focused only on Database area, identified 34 different approaches). • However, none of them focus specifically on the definition of a comprehensive relational approach coping with user-defined periodic data efficiently, considering: • a relational implicit representation, • additional temporal operators, e.g., range queries, • indexing and efficient access. • Also they do not take into account issues such as the definition of relational temporal algebraic operators, extending Codd’s operators to query periodic data, • All such issues are fundamental for the practical applicability of any Database approach considering periodic data. Institute for Integrated and Intelligent Systems - IIIS
Goals • The new comprehensive approach has to be designed in such a way that: • data model has the expressiveness to capture all ”periodic granularities”, as defined in the Database literature, • data model has to be a consistent extension of the TSQL2, • algebraic and temporal query operators has to be correct and complete with respect to conventional explicit approaches • Additionally, extended algebraic operators has to operate in polynomial time, and are a consistent extension of standard non-temporal relational algebraic ones. Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. Explicit approaches • There is an obvious and trivial way to cope with periodic data, by explicitly storing all of them. • Usually called “explicit” (or “extensional”) approach, basically reduces periodic data to standard temporal data. • For instance, to store activity A scheduled each day from 8 to 12 and 14 to 18 in a whole year, an explicit approach simply stores 365 time periods in which the activity takes place. Institute for Integrated and Intelligent Systems - IIIS
Explicit Representation of Periodic data Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. Explicit approaches • There is an obvious and trivial way to cope with periodic data, by explicitly storing all of them. • Usually called “explicit” (or “extensional”) approach, basically reduces periodic data to standard temporal data. • For instance, to store activity A scheduled each day from 8 to 12 and 14 to 18 in a whole year, an explicit approach simply stores 365 time periods in which the activity takes place. • The obvious advantage of such an approach is its simplicity as periodic temporal data are simply coped with as standard temporal data, so that any temporal Database approach in the literature can be used. • It also makes all of the database implementation simpler, from indexing to query processing. Institute for Integrated and Intelligent Systems - IIIS
Implicit vs. explicit approaches • However, there are at least three main disadvantages to the “explicit” approach: • It is not “commonsense” and “human-oriented”, humans usually tend to abstract, so that they usually prefer to manage periodic data in an implicit way, for example every day this year from 8 to 12 and 14 to 18, or each Monday from 15:00 to 17:00, etc. • It is very expensive in term of space complexity, due to the high storage requirements. In many practical application areas, the number of repetitions (at periodic time) of activities is very high, (Recording the activities on an schedule in a production chain). • It is not feasible in the case of “open ended” data (i.e., of data whose valid time end is open and for which there is no known future end). Institute for Integrated and Intelligent Systems - IIIS
TSQL2 data model • Let us restrict our discussion to valid time only and do not consider transaction time. • In many approaches (and, in particular, in TSQL2), valid time is associated to relational tuples, in the form of a pair of timestamps (the starting point of the valid time, and its ending point). • Given any schema R=(A1, …, An) (where A1 , …, Anare standard non-temporal attributes), a valid time relation r in TSQL2 is a relation defined over the schema: • where VTSand VTE are timestamps representing the starting and the ending time of the valid time period respectively. Institute for Integrated and Intelligent Systems - IIIS
Representing periodic granularities • We define quasi-periodic granularity G by a quadruple: • where : • P is an integer representing the duration of the periodic pattern; • IP is the set of the convex periods in the first “periodic pattern” (after the granule “0” of the bottom granularity); • IE is the set of the convex periods constituting the aperiodic part; • FT is the “frame time” i.e., the period containing all repetitions of the periodic pattern. Institute for Integrated and Intelligent Systems - IIIS
Representing periodic granularities • Let us exemplify our representation through an example: • As a working example, let us suppose that, in the year 2012, starting from Monday January 9th and ending on Sunday December 23rd, the “working shift” for a company is from 08:00 to 12:00, and from 14:00 to 18:00 each day from Monday to Friday and from 8 to 12 on Saturday (let us call such a granularity “working shift - WS”). • In addition, let us suppose that the “WS” also includes Saturday evening (from 14 to 18) in two specific days (say on January 13 and 20 (let us call “WS+” the granularity WS with such an addition). Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168,{[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567]⟩ Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168,{[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567]⟩ WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]}, {[460, 464], [620, 624]}; [0, 8567]⟩ • 168 hours (one week) is the duration of the periodic pattern - Per Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567]⟩ WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]}, {[460, 464], [620, 624]}; [0, 8567]⟩ • 168 hours (one week) is the duration of the periodic pattern - Per • {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the periodic pattern- IP Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567]⟩ WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]}, {[460, 464], [620, 624]}, [0, 8567]⟩ • 168 hours (one week) is the duration of the periodic pattern - Per, • {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the periodic pattern - IP • {[460, 464], [620, 624]} are the two aperiodic part repetitions - IE Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567] ⟩ WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]}, {[460, 464], [620, 624]}, [0, 8567]⟩ • 168 hours (one week) is the duration of the periodic pattern - Per • {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the periodic pattern - IP, • {[460, 464], [620, 624]} are the two additional repetitions for WS+ - IE • [0, 8567] is the frame of time containing all the periodical repetitions on the periodic pattern - FT. Institute for Integrated and Intelligent Systems - IIIS
A relational representation of quasi-periodic data • We define the Periodic relation as: • Given any schema R=(A1, …, An) where A1, …, Anare standard non-temporal attributes, a periodic relation r is a relation defined over the schema: RP = (A1, …, An | VTS , VTE , Per, IP) VTS - timestamp representing the starting point of the “frame time” VTE - timestamp representing the ending point of the “frame time” Per - interval, representing the duration of the repetition pattern IP - denoting a set of the periodic patterns (PerID) Institute for Integrated and Intelligent Systems - IIIS
Periodicity Relation • The periodic_pattern relation is defined over the schema: (PerID, Start, End ) in which: • PerID - attribute containing identifiers denoting periodic patterns • Start and End are temporal attributes (timestamps) denoting the starting and the ending points of the periods in the periodic pattern. Institute for Integrated and Intelligent Systems - IIIS
Periodic relations - Activity(Implicit model): Periodic_Pattern relation (Implicit model): Periodic (Explicit model): Relations Institute for Integrated and Intelligent Systems - IIIS
Representations Institute for Integrated and Intelligent Systems - IIIS
Temporal algebra • Codd designated as complete any query language that was as expressive as his set of five relational algebraic operators, relational union (∪), relational difference (−), selection (σ), projection (π), and Cartesian product (×). • We proposed an extension of Codd’s algebraic operators in order to query the new data model which manages implicitly periodic data. • Such a temporal algebra can provide the ground for a proper extension of SQL, or of the temporal language TSQL2 to cope with implicit periodic data. Institute for Integrated and Intelligent Systems - IIIS
Temporal union • As it can be seen temporal relational projection simply operates on the non-temporal part of the input tuples, retaining only the values of the input attributes. The temporal component of the tuples is left unchanged. Institute for Integrated and Intelligent Systems - IIIS
Temporal projection • For example, the query asks for patients (and the time of their treatments): • And it should result in: Institute for Integrated and Intelligent Systems - IIIS
Non-temporal selection • For instance, the query below asks for patients (and the time of their treatments) and requires that the value of Act Type attribute is A: • Result is: Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project • lcm, min, and max denote the least common multiple, minimum and maximum functions; • generate id() is a function that generated a new unique identifier; • pattern(id, Periodicity) denotes the set of periods corresponding to the identifier id in the table Periodicity. • The function generate_inters_pattern takes in input two identifiers id1 and id2 of periodic patterns in the Periodicity table, the identifier idnew of a new pattern (the intersection pattern) to be introduced in the table, the time tstart at which the new pattern starts, its duration Per, and the Periodicity table. Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Project • It operates in three steps: • (1) It retrieves from the Periodicity table the patterns corresponding to id1 and id2 and makes such patterns explicit on the whole period [tstart, tstart+Per], let < p11,…, p1k > and < p21,…, p2h > be the lists of temporally ordered periods obtained by making explicit the periodic pattern corresponding to id1 and id2 respectively (the function make explicit accomplishes such tasks). • (2) It evaluates the intersection between < p11,…, p1k > and < p21,…, p2h>,let <p1,…, pj> be the resulting ordered lists of periods • (3) If < p1,…, pj> is empty, the empty set is given as result, otherwise the table Periodicity is updated with the insertion of the new pattern < p1,…, pj> for the new identifier idnew. • The definition of temporal Cartesian product can be extended to temporal definitions of theta join, natural join, outer joins, and outer Cartesian products, in a way similar as presented in Gao et al (2005). Institute for Integrated and Intelligent Systems - IIIS
Temporal difference Institute for Integrated and Intelligent Systems - IIIS
Temporal difference • The function generate_difference_pattern(and generate_union_pattern) is analogous to generate_intersection_pattern, and generates in a frame time whose duration is the duration of the (new) repetition pattern the new pattern obtained by making the difference of the two input patterns (each ones being a set of periods). • Generate_union_patternis a slight generalization of such a process, in which union must be performed on an arbitrary number of such patterns. Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries Institute for Integrated and Intelligent Systems - IIIS
Representing quaisi-periodic granularities • WS and WS+ are represented in our formalism as follows, considering hours as the bottom granularity (notice that our approach is independent of the choice of the bottom granularity). WS = ⟨168, {[08,12], [14,18], [32, 36], ..., [128, 132]}, { }, [0, 8567]⟩ WS+ = ⟨168 ,{[08,12], [14,18], [32,36], ..., [128, 132]}, {[460, 464], [620, 624]}, [0, 8567]⟩ • 168 hours (one week) is the duration of the periodic pattern - Per, • {[08,12], [14,18], [32, 36], ..., [128, 132]}, is the first instance of the periodic pattern - IP • {[460, 464], [620, 624]} are the two aperiodic part repetitions - IE Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries Institute for Integrated and Intelligent Systems - IIIS
Check Periodic Intersection Institute for Integrated and Intelligent Systems - IIIS
Algorithm for range queries • The “circular module” operation is basically a standard module operator which operates on the endpoints of all the input periods, with the caution that, since the pattern is periodic, it must be interpreted in a “circular” fashion. • G_Intersects generate the Intersects predicate, in case of set of periods. Institute for Integrated and Intelligent Systems - IIIS
Experiment – Environment • Our implementation has been carried out on a 8 x UltraSparc III @ 900Hz with 8GB of RAM memory, running Oracle 11 RDBMS, with a database block size of 8K and size of SGA of 1000MB • We used Oracle built-in methods for statistics collection, analytic SQL functions and the PL/SQL procedural runtime environment. For range queries we utilised the RI-tree indexing method. • Data set is generated for the medical informatics domain because many activities are routinely executed at periodic time by nurses on hospitalized patients. • There are also cases of open-ended repeated activities (e.g., dialysis on diabetic patients must usually be performed twice or three times each week, for the life of the patient). Institute for Integrated and Intelligent Systems - IIIS
Sample therapy for multiple mieloma • The therapy for multiple mieloma is made by six cycles of 5-day treatment, each one followed by a delay of 23 days (for a total time of 24 weeks). • Within each cycle of 5 days, 2 inner cycles can be distinguished: • the melphalan treatment, to be provided twice a day, for each of the 5 days and • the prednisone treatment, to be provided once a day for each of the 5 days. • These two treatments must be performed in parallel. • It is important to record such data in the hospital database, in order to schedule hospital personnel activities, and to manage resource allocation. • It is obvious that managing this activities in explicit way is complex and also ity can cause errors Institute for Integrated and Intelligent Systems - IIIS
Experiment – Data set • Most clinical data are naturally temporal. • Number of Patients 16,824 • Average number of periodic activities per patient 8.30 • Average number of periods in a periodic pattern = 4.86 • Average duration of period of periodic patterns = 87.56 • Average duration of the frame time 1169 • As Explicit approach cannot cope with open-ended intervals we could not include them in experiment Institute for Integrated and Intelligent Systems - IIIS
Space Requirements Institute for Integrated and Intelligent Systems - IIIS
Temporal Cartesian Product - Results Institute for Integrated and Intelligent Systems - IIIS
Temporal Join - Results • For Temporal Join experiment we ran a query against temporal tables Activity and Working shifts to check which employees on duty can perform specific actions (ActID), (Working shifts JOIN Emp Capabilities) JOIN Activity. Institute for Integrated and Intelligent Systems - IIIS
Range queries - Results Institute for Integrated and Intelligent Systems - IIIS
Explicit/implicit ratio Institute for Integrated and Intelligent Systems - IIIS
Conclusion • We proposed a new approach to cope with periodic data in relational Databases, specifically: • we have proposed an “implicit” relational data model for user-defined periodic data, which is based on the “consensus” definition of granularity and is a “consistent extension” of TSQL2’s, • we have extended Codd’s algebraic operators of Cartesian product, Union, Projection, non-temporal selection, and Difference, • we have shown that such operators are complete and correct, • finally, in an extensive experimentation of our model we showed that our “implicit” approach outperforms the performance of traditional “explicit” approach. Institute for Integrated and Intelligent Systems - IIIS
Thank you for your attention! Institute for Integrated and Intelligent Systems - IIIS