1 / 35

Implementation of Extended Indexes in Postgres

This is a recopilation of original paper of Paul M. Aoki Computer Science Departament Of EECS University of California, Berkeley. Implementation of Extended Indexes in Postgres. lord.ataucuri@ucsp.edu.pe. Keywords. IR – Information Retreival RDBMS – Relational DataBase Management System.

tangia
Download Presentation

Implementation of Extended Indexes in Postgres

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This is a recopilation of original paper of Paul M. Aoki Computer Science Departament Of EECS University of California, Berkeley Implementation of Extended Indexes in Postgres lord.ataucuri@ucsp.edu.pe

  2. Keywords • IR – Information Retreival • RDBMS – Relational DataBase Management System

  3. Abstract • The vaunted "Spartan simplicity“ • There is no natural way to model a keyword index

  4. Abstract • Focunsing on two issues • General problems • Features

  5. Section One: Introducction • Technology does not meet the needs • Some new approaches

  6. Introducction • Some extension don’t fit precisely • This paper is a case study of the implementation of one such extension.

  7. Introducction • Mapping • Section 2 describes extended indexing as it was originally proposed, including a discussion of its advantages over other solutions and some implementation difficulties that it presents. • Section 3 gives an overview of the extensibility features of POSTGRES. • Section 4 provides detalls of an implementation of this type of indexing under POSTGRES such as the modifications made to the original proposal

  8. Section Two: Relational System for Information Retreival • There are two common choices • Inverted-file System • Relational System

  9. Section Two: Relational System for Information Retreival • Inverted-File System • Store collections in a order data struct • Disventages • the user must generate code or queries that make specific use of its properties.

  10. Section Two: Relational System for Information Retreival • Relational Systems Present collections of records as tables (relations). Advantages: • The data independence • Hide storage structure

  11. Section two: Relacional Systema for Inforamtion Retreival • Computer search for the best method

  12. Section Two: Relational System for Information Retreival • Index: • In DBMS terminology, For example: Q1: One might extract the values of a particular field from each record in a table

  13. Section Two: Relational System for Information Retreival • I mean that one can build an index over the column "emp.salary"-texable_income(emp.salary)". • This limits the usefulness of indexes to certain applications.

  14. Section 2.1 : Extended Indexing • User can add new index access methods to a DBMS. • It must be associated with an ordering/partitioning class. • The class information is used by query optimizer

  15. Section 2.1 : Extended Indexing • Example: BOXes • Build a set of binary Boolean operators <, <=, =, > , >= • Define an Ordering on Box colums • Associating “box-area-operators” class • Associating the B-Tree access method

  16. Section 2.1 Extended Indexing • Query optimizer sees a query that use “box area operators” • All meta data is stored in system catalog • Use on the fly

  17. Section 2.1 : Extended Indexing • Example: As a more realistic Bibliographic searches

  18. Section 3: Extensibility in Postgres • Extend the system • Example: “Box” type, “box-equality” function, “box-equality-operator” = , • R-tree

  19. Section 3: Extensibility in Postgres • Operators and Access method are assigned to classes • Overloaded • Dont need recompilate

  20. Section 4: The Implementation • Three stages • Type-function-operator definition • Access method implementation • Modification of Postgres internals

  21. Section 4: The Implementation • Type Function/Operators definition • Keyword and KeywordList • Function return a list

  22. Section 4: The Implementation

  23. Section 4: The Implementation • Modifications of Postgres internals • System catalogs modifications

  24. Section 5: Other modifications • Changes query optimizer were minimal • No changes to the query procesor

  25. Conclusions

  26. Any questions ? • lord.ataucuri@ucsp.edu.pe

  27. Identifying Algebraic Properties to Support Optimization of Unary Similirity Queries lord.ataucuri@ucsp.edu.pe

  28. Introducction • In 1970, Codd introduced the relational model, which is the foundation for most of the actual commercial DataBase Management Systems (DBMS). • It is based on the mathematical relation theory: the database is represented as a set of relations, where each relation is a table with tuples (or rows) and attributes(or columns). • Initially, the relational model supported only traditional data, i.e., numerical and string data types. • Elements of these types can be compared using exact matching • = , <, <= , > , >=

  29. Introducction • Now with the advent of multimedia and spatial applications, the Relational DBMS (RDBMS) must be able to support new data types, operators and kinds of queries. • Thus,similarity emerges as the natural way to compare elements in complex domains, such as images, audios, videos, genomic sequences, and time series, and consequently handling operations based on similarity (or distance) between data becomes a must

  30. Introduction • To ilustrate this, • Query1: “In a health-care information system: Given a mammography exam with images of left and right breast from cranio-caudal (RCC) and medio-lateral oblique (RMLO) views of a patient, show the exams whose texture do not dier more than 10 units from those in the exam".

  31. Example • Q2: In a health-care information system: \Given a head tomography exam of a patient showing a pathology, retrieve the 5 exams most similar not presenting pathology, and that texture do not dier more than 5 units from those in the exam". • Q3: In Geographic Information Systems (GIS): \Find the 15 districts nearest to `Arequipa' that are not farther than 15 miles, and where the population having between 21 and 64 year is greater than 65-year-old population and over".

  32. Partial Solution • Multi- similirity Algebra(MSA) • It has been designed to integrate dierent interpretations of similarity values • It has higher abstraction level and thus does not address the problem of an \operational" algebra.

  33. Introducciton • None of these previous works has addressed optimizations based on query rewriting for the similarity-based select operators in complex expressions

  34. Similarity Algebra

More Related