160 likes | 325 Views
www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008. Rough Sets in Data Warehousing Infobright Community Edition (ICE). Data Warehousing. Technology Layout. Two-Level Computing. Lar ge D ata ( 10TB ) and M ixed W orkloads. Rough Sets.
E N D
www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 Rough Sets inData WarehousingInfobright CommunityEdition (ICE)
Two-Level Computing Large Data (10TB) and Mixed Workloads
Rough Sets Classes of records with the same values of the subset of the attributes Sport? = Yes
Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...
Rough Sets in Infobright We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes Packs storing the values of records for column Salary
SELECT MAX(A) FROM T WHERE B>15; DATA STEP 1 STEP 2 STEP 3
Advanced Knowledge Nodes Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows
Count Distinct Count(*) on Self-Joins Decision Trees Contingencies New Objectives New Schemas New Volumes New Queries New KNs New Data Types SQL Extensions Feature Extraction Data Compression Community Inspirations
Conclusion • Technology based on interaction between rough and precise operations, open for adding new structures • Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression • The core technology based on more data mining, rough sets, computing with rough values, et cetera • Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions
References • D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright-house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): 1337-1345 (2008). • M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,J. Wróblewski: Method and System for Data Compression in aRelational Database. US Patent Application, 2008/0071818 A1. • J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna,M. Wojnarski: Method and System for Storing, Organizing andProcessing Data in a Relational Database. US Patent Application,2008/0071748 A1.
www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 THANK YOU!!!