1 / 20

A New OLAP Aggregation Based on the AHC Technique

DOLAP 2004. A New OLAP Aggregation Based on the AHC Technique. R. Ben Messaoud, O. Boussaid, S. Rabaséda. Laboratoire ERIC – Université de Lyon 2 5, avenue Pierre-Mendès–France 69676, Bron Cedex – France http://eric.univ-lyon2.fr. 0. 1. 2. 3. 4. 5. Complex data. Definition:

Download Presentation

A New OLAP Aggregation Based on the AHC Technique

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DOLAP 2004 A New OLAP Aggregation Based on the AHC Technique R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 2 5, avenue Pierre-Mendès–France 69676, Bron Cedex – France http://eric.univ-lyon2.fr

  2. 0 1 2 3 4 5 Complex data • Definition: Data are considered complex if they are … • Multi-formats: information can be supported by different kind of data (numeric, symbolic, texts, images, sounds, videos …) • Multi-structures: structured, unstructured or semi-structured (relational databases, XML documents …) • Multi-sources: data come from different sources (distributed databases, web …) • Multi-modals: the same information can be described differently (data in different languages …) • Multi-versions: data are updated through time (temporal databases, periodical inventory …) Ben Messaoud et al.

  3. Complex data 0 1 2 3 OLAP Data mining 4 5 MDBMS OpAC General context • Complex data • Huge volumes of complex data • Warehousing complex data … • OLAP facts as complex objects • Analyze complex data • Current OLAP tools aren’t suited to process complex data • Data mining is able to process complex data like images, texts, videos … • Coupling OLAP and data mining • Analyze complex data on-line • New operator OpAC: Operator of Aggregation by Clustering (AHC) Ben Messaoud et al.

  4. Outline Complex data and general context Related work: Coupling OLAP and data mining Objectives of the proposed operator Formalization of the operator Implementation and demonstration Conclusion and future works 0 1 2 3 4 5 Ben Messaoud et al.

  5. First approach Second approach 0 Third approach 1 2 3 OLAP Data mining 4 5 DBMS Related work • Three approaches for coupling OLAP and data mining • First approach: Extending the query languages of decision support systems • Second approach: Adapting multidimensional environment to classical data mining techniques • Third approach: Adapting data mining methods for multidimensional data Ben Messaoud et al.

  6. 0 1 2 3 OLAP Data mining 4 5 OpAC Related work • These works proved that: • Associating data mining to OLAP is a promising way to involve rich analysis tasks • Data mining is able to extend the analysis power of OLAP • Use data mining to enhance OLAP tools in order to process complex data • OpAC: A new OLAP operator based on a data mining technique Ben Messaoud et al.

  7. Sales Sales Count Count + Washington $2520 $2520 120 120 + California $2410 129 0 Sales Count 1 + Bellingham + Washington - Washington $700 32 $2410 129 + Bremerton $400 20 2 + Olympia $850 44 + Redmond $250 9 3 + Seattle $320 15 + Berkeley - California + California $820 41 4 + Beverly Hills $910 50 5 + Los Angeles $680 38 Objectives Classic OLAP aggregation Vs OpAC aggregation • Classic OLAP: • Summarizes numerical data in a fewer number of values • Computes additive measures (Sum, Average, Max, Min …) Example: Sales cube Ben Messaoud et al.

  8. 0 Images Size ASM 1 Orange coral 3560px 0,016 2 Nebraska, USA 0,021 2340px 3 Toco toucan 0,014 4434px 4 Maldives 3260px 0,012 5 Objectives Classic OLAP aggregation Vs OpAC aggregation • OpAC aggregation: • What about aggregating complex objects? • How to aggregate images, texts or videos with classic OLAP tools? • Complex objects are not additive OLAP measures … Example: Images cube ? Ben Messaoud et al.

  9. 0 1 2 3 4 5 Objectives • How to aggregate complex objects? • Using a data mining technique: AHC (Agglomerative Hierarchical Clustering) • The AHC aggregates data • The hierarchical aspect of the AHC Ben Messaoud et al.

  10. L1Normalized for high homogeneity 0 1 2 3 4 5 L1Normalized for low entropy Objectives Images Very high High Medium Low Very low Very high High Medium Low Very low Entropy Homogeneity Ben Messaoud et al.

  11. 0 1 X / X(gijt) =Measure of gsrvcrossed with gijt ì ì 2 S Ì í í where gsrvÎ hsr , s¹i and r is unique for each s 3 î î 4 5 Formalization Di: the ith dimension of a data cube C hij: the jth hirarchical level of the dimension Di gijt: the tth modality of hij The set of individuals: W Ì { gijt/ gijtÎ hij} • The set of variables: • Dimension retained for individuals can’t generate variables • Only one hierarchical level of a dimension is allowed to generate variables Ben Messaoud et al.

  12. 0 1 2 k Iintra(k) = å I(Ai) i=1 3 k Iinter(k) = å P(Ai)d(G(Ai),G(W)) 4 i=1 5 Formalization • Evaluation tools • Minimize the intra-cluster distances • Maximize the inter-cluster distances • Inter and intra-cluster inertia • A1, A2 , …, Akis a partition ofW • P(Ai)is the weight of Ai • G(Ai)is the gravity center of Ai Ben Messaoud et al.

  13. - Inter-clusters - Intra-cluster 0 1 2 3 500 Very high High Medium Low Very low Very high High Medium Low Very low 7 6 5 4 3 2 1 400 4 300 Entropy 200 5 100 Homogeneity 0 Formalization • Individuals: • Modalities from the dimension of images • Variables: • L1Normalized values of images for all possible modalities of the entropy dimension • L1Normalized values of images for all possible modalities of the homogeneity dimension Ben Messaoud et al.

  14. 0 1 2 3 4 5 Formalization Results: • Exploits the cube’s facts describing images to construct groups of similar complex objects • Highlights significant groups of objects by a clustering technique • Clusters –aggregates- are defined both from dimensions and measures of a data cube • Implementation of a prototype Ben Messaoud et al.

  15. 0 1 2 3 4 5 Implementation Prototype: • Data loading module: • Connects to a data cube on Analysis Services of MS SQL Server • Uses MDX queries to import information about the cube’s structure • Extract data selected by the user • Parameter setting interface: • Assists the user to extract individuals and variables from the cube • Selects modalities and measures • Defines the clustering problem • Clustering module: • Allows the definition of the clustering parameters like dissimilarity metric and aggregation criterion • Constructs the AHC • Plots the results of the AHC on a dendrogram Ben Messaoud et al.

  16. 0 1 2 3 4 5 Implementation Images dataset: • 3000 images collected from the web: • Semantic annotation: Description, subject and theme • Descriptors of texture like: • ENT: Entropy • CON: Contrast • L1Normalized: Medium Color Characteristic • … • Three color channels: RGB Ben Messaoud et al.

  17. 0 1 2 3 4 5 Implementation Demonstration: Ben Messaoud et al.

  18. 0 1 2 3 4 5 Conclusion • OpAC is a possible way to realize on-line analysis over complex data • OpAC aggregates complex objects • Aggregates –clusters- are defined from both dimensions and measures of a data cube • Prototype available at : http://bdd.univ-lyon2.fr/?page=logiciel&id=5 Ben Messaoud et al.

  19. 0 1 2 3 4 5 Future works • The current evaluation tool may present some limits • Use other evaluation indicators to evaluate the quality of partitions • Assist user to find the best number of clusters • Exploit the aggregates generated by OpAC in order to reorganize the cube’s dimensions • Get a new cube with remarkable regions • Use other data mining technique to enhance the OLAP power with explanation and prediction capabilities Ben Messaoud et al.

  20. The End Ben Messaoud et al.

More Related