180 likes | 201 Views
Explore the intelligent handling of query processing techniques and data mining in data warehouses for optimized data storage and retrieval.
E N D
ISDA'2003 Data Mining Techniques in Index Techniques Ying Wah Teh and Abu Bakar Zaitun tehyw@.um.edu.my, zab@um.edu.my University of Malaya Faculty of Computer Science and Information Technology January 5, 2020 1
Contents Introduction Query Processing Techniques Evaluation of Data Mining Prototypes Conclusion January 5, 2020 2
Introduction What data to gather and how to conceptually model the data and manage its storage Logical database design Physical database design Very large data storage nowadays Redundant data structures the intelligent way of managing storage Fast access to data Selecting the right elements to build redundant data structures Only a few data warehouse administrators can do justice to the task of picking the right redundant data structures. January 5, 2020 3
Query Processing Techniques Historical Perspectives File Processing / Full Scan / Sequential Scan Simple index B-Tree index Present Scenarios of Query Processing Techniques BitMap Index Single-column indexes January 5, 2020 4
File Processing A programmer needs to know at least one-third generation language for writing a data retrieval program to access the relevant information from a file system. Query processing techniques (sequential scan or full scan) It is more suitable for the small data volume environment. January 5, 2020 5
Simple Indexes / Hashed Key DBMSs were developed that included simple indexes. It allows users to access information very quickly by a unique value. It creates a list of record identification which acts as pointers to records. Exactly key value to access data. January 5, 2020 7
B-tree indexes Partial key lookups and exactly key lookup. It is a very costly to create for every query. The intelligent way of handling the B-Tree index. January 5, 2020 9
Present Scenario Issues a query that only requires a small portion of the result of relations and the predicate is non-primary key. Only one RID index can be used at a time. January 5, 2020 11
BitMap Index Bit-vector approach A RID occupies at least 8 bits, while a BitMap index occupies only 1-bit pointer to a tuple of the relation. Work well only with low-cardinality data (Female, Male). The intelligent way of handling the BitMap is the vital issue. January 5, 2020 12
Single-column indexes Index intersection offers greater flexibility A good strategy would be to define single- column indexes on all columns that will be frequently queries and let index intersection handle situation. The intelligent way of handling the single- column indexes is the vital issue. January 5, 2020 13
Our Research Perspective Most researchers apply data mining at the application level of data warehouse. We applied data mining in the physical design of data warehouses to optimise the base relation. January 5, 2020 14
Architecture of One-column Index Selection January 5, 2020 15
Evaluation of Data Mining Prototypes January 5, 2020 16
Conclusion It is necessary to have an intelligent way of handling the various query processing techniques (such as indexes). Data mining techniques can be used in the physical design of a data warehouse to generate single- column indexes. The positive results from the study should motivate further efforts to make it into a fully functional SQL engine. January 5, 2020 17
Thank You Questions? tehyw@um.edu.my January 5, 2020 18