1 / 15

Data Mining

Data Mining. David Eichmann School of Library and Information Science The University of Iowa. Why?. Given enough data represented through enough dimensions, we loose the ability to see the patterns. How?. Decision Trees Nearest Neighbor Clustering Neural Networks Rule Induction

Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining David Eichmann School of Library and Information Science The University of Iowa

  2. Why? • Given enough data represented through enough dimensions, we loose the ability to see the patterns

  3. How? • Decision Trees • Nearest Neighbor Clustering • Neural Networks • Rule Induction • K-Means Clustering

  4. What is it? • The automated extraction of hidden predictive information from databases. • Key points • Automated • Hidden • Predictive

  5. The Typical Process

  6. Evaluation Criteria • Receiver Operating Characteristic Curves

  7. But Nobody Said We Had To Do MATH….

  8. Forms of Data • Structured • Databases • Forms • Semi-Structured • Tables on the Web • Bibliographic citations • Graphs & charts • Unstructured • Full text (e.g., journal articles, physician chart notes) • Images

  9. Text Mining • Corpus now is a collection of text artifacts • Full text when you’ve got it (e.g. newswire) • Metadata when you don’t (e.g. MEDLINE) • The trick then becomes extracting ‘interesting’ relationships between ‘interesting’ entities • Who killed who • Who works for who • Who makes what

  10. The Classic Entities • Persons • Organizations • Places (Geography) • Events

  11. A Newswire Example • APW19981001.0262 [Israel(0.271), Jonathan Pollard (0.153), Benjamin Netanyahu(0.102), Bill Clinton(0.102), United States(0.055), ...] • Persons • Bill Clinton (3) • Jonathan Pollard (8) • Moshe Fogel (2) • Benjamin Netanyahu (2) • Israeli Embassy (1) • Organizations • Cabinet (1) • Places • Israel (16) • United States (5) • Washington (2)

  12. In the Medical/Health Realm • UMLS an excellent framework • Organism • Chemical • Activity • Disease

  13. A MEDLINE Example • Document: 89316090 - Reconstructive surgery in Nicaragua • Provided MeSH Keywords • Human • Nicaragua • Z01.107.169.690 • Surgery, Plastic/* • G02.403.810.788 • Phrases • [Reconstructive, surgery] • [Nicaragua] • [letter] • MeSH Terms • Surgery (1) • G02.403.810.762 • Letter [Publication Type] (1) • Other Phrases • Reconstructive surgery (1)

  14. Concept Extraction Example • “Roman forces under Julius Caesar invade Britain.” (S (NP (NP Roman forces) (PP under (NP Julius Caesar))) (VP invade (NP Britain)) .) • Entity Attributes: • <organization Roman forces> • <person Julias Caesar> • <placename Britain> • Concepts: • <Roman forces - under - Julius Caesar> • <Roman forces - invade - Britain>

  15. And a Small Demo…

More Related