190 likes | 354 Views
Native Multidimensional Indexing in Relational Databases. David Hoksza , Tom áš Skopal Charles University in Prague Department of Software Engineering Czech Republic. Presentation Outline. Multidimensional querying attribute number growth indexing methods contemporary DB systems
E N D
Native Multidimensional Indexingin Relational Databases David Hoksza, Tomáš Skopal Charles University in PragueDepartment of Software Engineering Czech Republic
Presentation Outline • Multidimensional querying • attribute number growth • indexing methods • contemporary DB systems • Indexing in PostgreSQL • user-defined access methods • external indexing • framework • Experiments COMAD 2008
Indexing • Single-attribute based indexing • to avoid sequential scan • B-tree • Multi-attribute based indexing • window query • straightforward solution – multiple B-trees • SELECT * FROM Products • WHERE • BrandID BETWEEN (3 AND 11) • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) COMAD 2008
Multi-attribute Based Indexing • Multiple B-trees • attribute number growth • → exponential growth of partial result-sets • → sequential scan • dimensionality curse COMAD 2008
B+-tree with Compound Keys most often employed solution multiple keys – single chained value key components compared in lexicograhpical order assymetry in the order of the keys UB-tree transformation of n-dim points into 1-dim Z-address→ Z-curve Z-curves divided into Z-regions indexed by a B+-tree Multi-dimensional access methods R-tree COMAD 2008
Native Multi-dimensional Indexing • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) • Table rows • points in n-dimensional space • R-tree • Queries • cubes in n-dimensional space (n-dimensionalwindows) COMAD 2008
Contemporary DB systems COMAD 2008
PostgreSQL • Object-relational DBMS • Open-source • Since 2005 (v. 8.0) runs on Windows • Emphasis on extensibility • data types • operators • procedural languages • access methods • … COMAD 2008
Relation Types in PostgreSQL • Heap relations (HRs) • user relations • system catalog • undefined order • Index relations (IRs) • <key,value> pairs • external • fast access to heap relations • internal • access methods’ structures COMAD 2008
User-defined Access Methods (AM) in PostgreSQL • Implement a set of functions communicating with PostgreSQL’s core (AM implementation). • Register the functions (located in libraries). • Define an AM (index type) by connecting the functions with a newly created AM. • Establish a class of operators and types for the AM. • Use the index. COMAD 2008
required functions index_build creating a structure index_insert inserts a record index_beginscan starts a new scan index_gettuple gets a record fulfilling search conditions index_getmulti gets a set of records index_endscan finishes a search index_markpos marks actual position in a scan index_restrpos returns to a marked position index_rescan repeats scan with the same structure of search keys index_bulkdelete removes a set of records index_costestimate estimates cost of a search User-defined Access Methods in PostgreSQL – cont. COMAD 2008
External Indexing • AM in PostgreSQL store data in IRs returning IRs’ TIDs to the core • External indexing framework • Storing IRs’ TIDs in external index storage • PostgreSQL’s interface to access methods still requires high level of mastering of PostgreSQL’s inner mechanisms • Framework for external indexing COMAD 2008
Framework interface • void FW_CreateStructure(Relation index_relation); • void*FW_PrepareInsert(Relation index_relation); • void FW_InsertTuple(void *fw_data, Relation index_relation, IndexTuple index_tuple, BlockNumberblock_number,OffsetNumber offset); • void FW_FinishInsert(void *fw_data); • void FW_InitSearch(IndexScanDesc scan, ScanDirectiondir); • bool FW_GetNextTID(IndexScanDesc scan, ScanDirectiondir, BlockNumber *block_number, OffsetNumber *offset); • void FW_DeleteTuple(BlockNumber block number,OffsetNumber offset); COMAD 2008
Experimental evaluation • The testbed • Uniform • clusters of uniformly distributed (up to 15-dimensional) objects • Gauss • (up to 3-dimensional) objects following Gaussian distribution • DBLP • 435,373 DBLP database records • author, type of publication, year of publication, number of pages • Studied costs • index access count • real-time COMAD 2008
Experiments – Database Growth COMAD 2008
Experiments – Query Selectivity COMAD 2008
Experiments – Dimension Growth COMAD 2008
Experiments – Real-time COMAD 2008
Conclusion • We have proposed and implemented • Native multidimensional indexing by R-tree • Indexing framework for PostgreSQL • Implementation of native external R-tree index • Results show • big speed-up on the real-world data according to index-access metric • poor physical implementation of access methods in PostgreSQL in comparison to Oracle and MSSQL Server COMAD 2008