250 likes | 331 Views
INLS890 Evidence-Based Discovery Spring, 2009. Catherine Blake, Ph.D. Today. Introductions Administration Course Structure Learning Objectives Assessment Motivation. Introductions. Dr Catherine Blake Email - cablake@email.unc.edu Office - 214A Manning Hall Lecture Time
E N D
INLS890 Evidence-Based DiscoverySpring, 2009 Catherine Blake, Ph.D
Today • Introductions • Administration • Course Structure • Learning Objectives • Assessment • Motivation
Introductions • Dr Catherine Blake • Email - cablake@email.unc.edu • Office - 214A Manning Hall • Lecture Time • 214 Manning Hall • Thurs 5:00-7:30pm • Office Time • Email – anytime • By Appointment – Tues and Wed
Operational Details • Web Page • http://www.ils.unc.edu/~cablake/INLS890-EBD • Username = ebd Password = spr2009 • Email • Fastest response time • Please email from your UNC account • Start the subject with INLS890 • University Honor Code is in effect
Course Objectives • This course combines theoretical models from discovery science, with a survey of informatics tools that support discovery. • The seminar will show-case the discovery process via a lecture series comprising both discipline and policy champions and thus reveal the synergy between synthesis and discovery and the need for interdisciplinary collaboration.
Theory Theme • Kuhn • Normal versus Revolutionary Science • Abnormalities • Chalmers • Observation • Falsification • Information Quality • Meta-analysis • Information quality
Informatics Theme • Language tools • Information Extraction - Text Mining • Document Summarization - Entailment • Social Networking • Bibliometrics - Visualizations • Workflow • Myexperiment • Domain specific software • Chrystallography - BLAST
Practice Theme • Synthesis • Timothy S. Carey, MD, MPH Sarah Graham Kenan Professor of Medicine Director, Cecil G Sheps Center for Health Services Research • Ila Cote, PhD, DABT Acting Division Director US Environmental Protection Agency National Center for Environmental Assessment • Discovery • Paul Jones Clinical Associate Professor School of Information and Library Science Director of ibiblio.org • Michael T Crimmins PhD. Mary Ann Smith Distinguished Professor of Chemistry UNC and Department Chair Department of Chemistry • Rudy L Juliano PhD. Boshamer Distinguished Professor of Pharmacology Principal Investigator, Carolina Center of Cancer Nanotechnology Excellence
Practice Theme • Discovery • Robert C Millikan DVM PhD Barbara Sorenson Hulka Distinguished Professor Department of Epidemiology School of Public Health • Jan F. Prins PhD. Professor of Computer Science and Chairman, Department of Computer Science • Alexander Tropsha, Ph.D. Professor and Chair Director, Laboratory for Molecular Modeling • Suzanne West, PhD Research Associate Professor Department of Epidemiology Acting Director, UNC-GSK Center of Excellence in Pharmacoepidemiology and Public Health • To be confirmed • Humanities Scholar • Steven W. Matson Ph.D. Professor and Chair Department of Biology
Typical Class Structure • Before class (All): Post expert questions • First Hour • Presentation by domain expert • Anointed domain expert – engage the presenter ! • Second Hour • Anointed informatics expert - present technologies • Discuss the intersection between theory, practice and informatics • Last 30 mins • Anointed domain expert – introduce next expert
Assignments • Informatics Review • What domain specific tools are used in your discipline ? • What generic tools exist for your discipline • Information extraction • Text mining tool kits • Post results to the wiki
Assignments • Engage the presenter • Introduce the presenter the week before • Read their materials ahead of time • Find out what else they do • Give us any context you can about the person • What are the key journals in the field
Assignment • Gap analysis • What informatics tools work in your discipline ? • What gaps exist between the academic work being done by these researchers and the informatics tools that are currently available ?
Assignments • Scientific practice in your domain • Conduct Interviews • Transcribe the interviews • Summarize your findings • Group activities • Create wiki • Review questions • Submit IRB • Keep track of reference
Dissemination • Dissemination • How are we going to get this to people in the field ? • Health Science Library • Paper in their conference • Face to face visits • … what other mechanisms
Assignments • Class participation • Read the assigned readings • Participate in class discussion • Contribute to the wiki
Assessment • Class Participation 20% • Attendance and contributions to discussion • Informatics Review 20% • Introducing and Engaging your speaker 20% • Gap Analysis • Data collection activities 10% • Final report 20% • Class contributions 10%
Motivation • Massive increase in electronic text • MEDLINE • Abstracts from more than 5,000 journals • Current: more than 17 million citations • Growth: ~12000 new citations every week • Chemistry – more than 110,000 articles in 2002 alone • Consequences • Hundreds of thousands of relevant articles • Implicit connections between literature go unnoticed Shift focus from Retrieval to Synthesis Source: MEDLINE factsheet http://www.nlm.nih.gov/pubs/factsheets/medline.html Source: Calculated from ISI’s 418 highest ranked chemistry articles
Information Overload “One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world” - Barnaby Rich, 1613
Clustering Categorization Association Rules IBM Intelligent Miner for text (Clustering) SAS Text Miner (Association Rules) Existing Text Mining
Example Pattern : Decision Tree person P, P.degree = masters and P.income > 75,000 P.credit = excellent
Articles represented as vectors Assign n random articles Assign remaining articles to closest cluster Snowy peaks indicate highly funded research Kohonen Maps NCI-funded research 1995-present Blake,C and Tengs,T (2001) “The Nation’ Breast Cancer Research Portfolio: A view from 30,000 feet”, Avon Symposium, UC Irvine.
Source Literature C Migraine Target Literature A Magnesium B-Platelet Activity B-Calcium Channel Blockers B-Serotonin ... Knowledge Discovery in Literature Swanson, DR (1988) “Migraine and magnesium: eleven neglected connections”, Perspect. Biol. Med., 31: 526-57. Blake, C. & Pratt, W. (2002). A Semantic Approach to Identify Candidate Treatments from Existing Medical Literature. In AAAI Symposium on Knowledge-based Approaches, Stanford, CA.