350 likes | 446 Views
This is a recopilation of original paper of Paul M. Aoki Computer Science Departament Of EECS University of California, Berkeley. Implementation of Extended Indexes in Postgres. lord.ataucuri@ucsp.edu.pe. Keywords. IR – Information Retreival RDBMS – Relational DataBase Management System.
E N D
This is a recopilation of original paper of Paul M. Aoki Computer Science Departament Of EECS University of California, Berkeley Implementation of Extended Indexes in Postgres lord.ataucuri@ucsp.edu.pe
Keywords • IR – Information Retreival • RDBMS – Relational DataBase Management System
Abstract • The vaunted "Spartan simplicity“ • There is no natural way to model a keyword index
Abstract • Focunsing on two issues • General problems • Features
Section One: Introducction • Technology does not meet the needs • Some new approaches
Introducction • Some extension don’t fit precisely • This paper is a case study of the implementation of one such extension.
Introducction • Mapping • Section 2 describes extended indexing as it was originally proposed, including a discussion of its advantages over other solutions and some implementation difficulties that it presents. • Section 3 gives an overview of the extensibility features of POSTGRES. • Section 4 provides detalls of an implementation of this type of indexing under POSTGRES such as the modifications made to the original proposal
Section Two: Relational System for Information Retreival • There are two common choices • Inverted-file System • Relational System
Section Two: Relational System for Information Retreival • Inverted-File System • Store collections in a order data struct • Disventages • the user must generate code or queries that make specific use of its properties.
Section Two: Relational System for Information Retreival • Relational Systems Present collections of records as tables (relations). Advantages: • The data independence • Hide storage structure
Section two: Relacional Systema for Inforamtion Retreival • Computer search for the best method
Section Two: Relational System for Information Retreival • Index: • In DBMS terminology, For example: Q1: One might extract the values of a particular field from each record in a table
Section Two: Relational System for Information Retreival • I mean that one can build an index over the column "emp.salary"-texable_income(emp.salary)". • This limits the usefulness of indexes to certain applications.
Section 2.1 : Extended Indexing • User can add new index access methods to a DBMS. • It must be associated with an ordering/partitioning class. • The class information is used by query optimizer
Section 2.1 : Extended Indexing • Example: BOXes • Build a set of binary Boolean operators <, <=, =, > , >= • Define an Ordering on Box colums • Associating “box-area-operators” class • Associating the B-Tree access method
Section 2.1 Extended Indexing • Query optimizer sees a query that use “box area operators” • All meta data is stored in system catalog • Use on the fly
Section 2.1 : Extended Indexing • Example: As a more realistic Bibliographic searches
Section 3: Extensibility in Postgres • Extend the system • Example: “Box” type, “box-equality” function, “box-equality-operator” = , • R-tree
Section 3: Extensibility in Postgres • Operators and Access method are assigned to classes • Overloaded • Dont need recompilate
Section 4: The Implementation • Three stages • Type-function-operator definition • Access method implementation • Modification of Postgres internals
Section 4: The Implementation • Type Function/Operators definition • Keyword and KeywordList • Function return a list
Section 4: The Implementation • Modifications of Postgres internals • System catalogs modifications
Section 5: Other modifications • Changes query optimizer were minimal • No changes to the query procesor
Any questions ? • lord.ataucuri@ucsp.edu.pe
Identifying Algebraic Properties to Support Optimization of Unary Similirity Queries lord.ataucuri@ucsp.edu.pe
Introducction • In 1970, Codd introduced the relational model, which is the foundation for most of the actual commercial DataBase Management Systems (DBMS). • It is based on the mathematical relation theory: the database is represented as a set of relations, where each relation is a table with tuples (or rows) and attributes(or columns). • Initially, the relational model supported only traditional data, i.e., numerical and string data types. • Elements of these types can be compared using exact matching • = , <, <= , > , >=
Introducction • Now with the advent of multimedia and spatial applications, the Relational DBMS (RDBMS) must be able to support new data types, operators and kinds of queries. • Thus,similarity emerges as the natural way to compare elements in complex domains, such as images, audios, videos, genomic sequences, and time series, and consequently handling operations based on similarity (or distance) between data becomes a must
Introduction • To ilustrate this, • Query1: “In a health-care information system: Given a mammography exam with images of left and right breast from cranio-caudal (RCC) and medio-lateral oblique (RMLO) views of a patient, show the exams whose texture do not dier more than 10 units from those in the exam".
Example • Q2: In a health-care information system: \Given a head tomography exam of a patient showing a pathology, retrieve the 5 exams most similar not presenting pathology, and that texture do not dier more than 5 units from those in the exam". • Q3: In Geographic Information Systems (GIS): \Find the 15 districts nearest to `Arequipa' that are not farther than 15 miles, and where the population having between 21 and 64 year is greater than 65-year-old population and over".
Partial Solution • Multi- similirity Algebra(MSA) • It has been designed to integrate dierent interpretations of similarity values • It has higher abstraction level and thus does not address the problem of an \operational" algebra.
Introducciton • None of these previous works has addressed optimizations based on query rewriting for the similarity-based select operators in complex expressions