1 / 12

A Wordification Approach to Relational Data Mining : Early Results

A Wordification Approach to Relational Data Mining : Early Results. Matic Perovšek, Anže Vavpetič, Nada Lavrač Jožef Stefan Institute, Slovenia. Overview. Introduction Methodology Experimental results Conclusion. Introduction.

starr
Download Presentation

A Wordification Approach to Relational Data Mining : Early Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A WordificationApproach to RelationalDataMining: EarlyResults Matic Perovšek, Anže Vavpetič, Nada Lavrač Jožef Stefan Institute, Slovenia

  2. Overview • Introduction • Methodology • Experimental results • Conclusion

  3. Introduction • Relational data mining algorithms aim to induce models and/or relational patterns from multiple tables • Individual-centered relational databases can be transformed to a single-table form – propositionalization

  4. Motivation • Wordificationinspiredbytextminingtechniques • Largenumberofsimple, easy to understandfeatures • Greaterscalability, handlinglargedatasets • Can be used as a preprocessing step to propositional learners, as well as to declarative modeling / constraint solving (De Raedt et al., today’s invited talk)

  5. Methodology • Transformation from relational database to a textual corpus • TF-IDF weightcalculation

  6. Transformation from relational database to a textual corpus • One individual of the initial relational database -> one text document • Features-> the words of this document • Words constructed as a combination:

  7. Transformation from relational database to a textual corpus • For each individual, the words generated for the main table are concatenated with words generated from the secondary (BK) tables

  8. Example

  9. TF-IDF weights • No explicit use ofexistential variables in our features, TF-IDF instead • The weight of a word gives a strong indication of how relevant is the feature for the given individual. • The TF-IDF weights can then be used either for filtering words with low importance or using them directly by a propositional learner.

  10. Experimental results • Slovenian traffic accidents database • IMDB database • Top 250 and bottom 100 movies • Movies, actors, movie genres, directors, director genres • Applied the wordification methodology • Performed association rule learning

  11. Experimental results

  12. Conclusion • Novel propositionalizationtechniquecalledWordification • Greaterscalability • Easy to understandfeatures • Furtherwork: • Test on largerdatabases • Experimentalcomparisonwithotherpropositionalizationtechniques • Combine with propositionalization–like approach to mining heterogeneous information networks (Grčar et al. 2012), applicable to CLP in data preprocessing Grčar, Trdin, Lavrač: A Methodology for Mining Document-Enriched Heterogeneous Information Networks, Computer Journal 2012

More Related