Bettina Berendt â€“ thanks for joint work with and support from Ilija Subasi Ä‡ Mathias Verbeke

Where does this new information belong?From developing mining algorithms to supporting knowledge discovery Bettina Berendt – thanks for joint work with and support from Ilija Subasić Mathias Verbeke Siegfried Nijssen Luc De Raedt K.U. Leuven

Yes we can! The problem

The solution? Automatic topic dectection Health 0.017 Care 0.015 Insurance 0.013 American 0.013 Uninsured 0.009 Families 0.008 Working 0.005

Visionary president Damp-rag president Rhetorics Party-politics (right and left) Obama‘s overall agenda Same event/document; different interpretations & categorisations

Text mining Stream mining Media studies ! Conference programme Similar problems in science and learning Topic detection in time-indexed corpora of news texts

Similar problems in other areas Music collections, multimedia collections: see Andreas Nürnberger‘s talk at SML 2010

Political activist Female Has problems with anger management The solution?Context-aware systems / personalisation You probably do / should think about it this way: ...

What users want ... to structure the world how they see it  interactivity left right ... to re-use their categories (that they worked so hard to find)  semantics ... to acknowledge that others see the world differently squares / circles • Social similarity / diversity green / not green ... to be able to see through their eyes is (nearly) green • perspective- taking ... to provide data mining methods to do all that!

 Research agenda The problem  interactivity automatic topic dectection  semantics support sense-making = provide methods / tools for Knowledge Disovery (in the full sense) • Social similarity / diversity • perspective- taking ... to provide data mining methods to do all that!

 Research agenda Our solution approach The problem  interactivity automatic topic dectection  semantics support sense-making = provide methods / tools for Knowledge Disovery (in the full sense) • Social similarity / diversity • perspective- taking ... to provide data mining methods to do all that!

STORIES: functionality basics

Burstiness measure • time relevance, • a “temporal co-occurrence lift” Selection approach for concepts • concepts = words or named entities • salient concept = high TF & involved in a salient relation, time-indexed STORIES: mining basics (1)Graphical summarisation of multiple text documents Document / text pre-processing Similarity measure to determine salient relations • Template recognition • Multi-document named entities • Stopword removal, lemmatization • “fact (assertion) recognition” • bursty co-occurrence Document summarization strategy • no topics, but salient concepts & relations • time window; word-span window

STORIES: mining basics (2)Graph analysis for query recommendation Aim: highlight subgraphs that represent an event Topological properties Change: Subgraph new in this period

STORIES: evaluation • Information retrieval quality • Edges – events: up to 80% recall, ca. 30% precision • Search quality • Subgraphs index coherent document clusters • Learning effectiveness • Document search with story graphs leads to averages of • 67-75% accuracy on judgments of story fact truth • on average, 1.3-4.7 queries with 3.4-5.2 nodes/words per query • Comparison with other temporal text mining methods • New (and only) framework for cross-method comparison • Recall-&precision-style metrics  different method rankings

Apply my grouping rfid (Security/privacy, Group 2, ...) to the following new search result: * Show users and how similarly they group * Apply U4‘s grouping to my new search result: Damilicious: functionality basics

Damilicious: mining basics (1)Methods and process • Query • Automatic clustering • Manual regrouping • Re-use • Learn classifier & present way(s) of grouping • Transfer the constructed concepts Features/methods for the conceptual/predictive clustering: • Lingo phrases, Lingo clustering, Ripper • co-citation, bibliometric coupling, word or LSA similarity, combinations; k-means, hierarchical

Damilicious: mining basics (2)Measures of grouping and user diversity Diversity = 1 – similarity = 1 - Normalized mutual information (entropy-based measure) • “How similarly do two users group documents?“ • For each query q, consider their groupings gr: • For several queries: aggregate • “How similarly do two users group documents?“ • For each query q, consider their groupings gr: • For several queries: aggregate NMI = 0

Damilicious: evaluation • Clustering: Does it generate meaningful document groups? • yes (tradition in bibliometrics) – but: data? • Small expert evaluation of CiteseerCluster • Choosing the clustering and classification methods for conceptual clustering • Experiments: different features, clustering methods, classification methods  quality of reconstruction and extension-over-time (NMI) • Technology acceptance • End-user experiment (clustering & regrouping) • 5-personformative user study (transfer of own results)

Conclusions and (some) questions • Sense-making involves • Extracting information from texts • Extracting structural information between entities • Creating, using and modifying categories • Interacting with external representations • Acknowledging diversity and perspective-taking • ... • Appropriate mining methods, measures, ...? • More/better evaluation methods and frameworks? • Use cases? KD approach Text mining Graph mining Semantics Interactivity Usage mining and “model-processing“ (conceptual / predictive clustering) • Sense-making involves • Extracting information from texts • Extracting structural information between entities • Creating, using and modifying categories • Interacting with external representations • Acknowledging diversity and perspective-taking • ...

Questions ? you ! Thank

To Read • Subašić, I. & Berendt, B. (2009). Discovery of interactive graphs for understanding and searching time-indexed corpora. Knowledge and Information Systems. DOI - 10.1007/s10115-009-0227-x (PDF) • Berendt, B. & Subašić, I. (2009). STORIES in time: a graph-based interface for news tracking and discovery. n N. Cristianini & M. Turchi (Eds.), Proceedings of Intelligent Analysis and Processing of Web News Content (IAPWNC) at The 2009 IEEE /WIC / ACM International Conferences Web Intelligence (WI'09) / Intelligent Agent Technology (IAT'09). 15 September 2009, Milan, Italy. (Proceedings of WI-IAT.2009, DOI 10.1109/WI-IAT.2009.342, pp. 531-534) (PDF) • Verbeke, M., Berendt, B., & Nijssen, S. (2009). Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search. In G. Boato & C. Niederee (Eds.), Proceedings of First International Workshop on Living Web, collocated with the 8th International Semantic Web Conference (ISWC-2009), Washington D.C., USA, October 26, 2009. CEUR Workshop Proceedings Vol-515. (PDF) • Berendt, B. (2010). Diversity in search: what, how, and what for? Talk at Barcelona Media / Yahoo! Research and UPF, 4 March 2010. (PPT) • Berendt, B., Krause, B., & Kolbe-Nusser, S. (2010). Intelligent scientific authoring tools: Interactive data mining for constructive uses of citation networks. networks. Information Processing & Management, 46(1), 1-10. (PDF)

Bettina Berendt â€“ thanks for joint work with and support from Ilija Subasi Ä‡ Mathias Verbeke

Bettina Berendt â€“ thanks for joint work with and support from Ilija Subasi Ä‡ Mathias Verbeke

Presentation Transcript

JOINTS

Joint Logistics (Distribution) Joint Integrating Concept Update Joint Concept Steering Group 25 August 2005

Chapter 20: The Knee and Related Structures

Fact or Fiction?

What can we learn from this skull about Prehistoric medicine?

Information Extraction, Data Mining and Joint Inference

Arthrography

JSETS Joint SARSAT Electronic Tracking System

Chapter 20: The Knee and Related Structures

Joint work with the Sherpa team in Cloud Computing

Fernando G.S.L. Brand ão ETH Zürich Based on joint work with A. Harrow and M. Horodecki

Lumbar Spine

SCG Court: A Crowdsourcing Platform for Innovation

JOINT PATHOLOGY

Shoulder Dislocation

Peripheral Joint Mobilization -- Shoulder Joints

Stretching Exercises

Joint work with:

Joint work with:

肌肉骨骼系统