200 likes | 346 Views
Information Retrieval and its Application in Biomedicine. Sept 4 Introduction. Hong Yu 1,2 , PhD Susan McRoy 1 , PhD 1 Department of Computer Science 2 Department of Health Sciences University of Wisconsin-Milwaukee. What is Information Retrieval?.
E N D
Information Retrieval and its Application in Biomedicine Sept 4 Introduction Hong Yu1,2, PhD Susan McRoy1, PhD 1Department of Computer Science 2Department of Health Sciences University of Wisconsin-Milwaukee
What is Information Retrieval? • The field concerned with the acquisition, organization, and searching of knowledge-based information. (Hersh, 2003)
Information • World Wide Web • Company Documentations • Drug Descriptions • Medical Records • Books • Everything that is text, image, video, and sound, and that can be transformed digitally
Information in Biomedicine • Literature (over 17 million publications) • WWW • Electronic medical records • Genomics data • DNA sequences, etc. • Knowledge representation • Gene Ontology • Company databases • Micromedex drug database
IR in Biomedicine • Index Medicus (Billings 1879) • MEDLARS (NLM 1966) • SAPHIRE (Hersh 1990) • PubMed (NLM 1996) • Arrowsmith (Smalheiser 1998) • BioText (Hearst 2003) • BioMedQA (Yu 2006)
Electronic and Open Publishing • Internet and Web have a profound impact on the publishing of knowledge-based information • Most of literature can be electronically available • Open-access • The Bethesda Statement on Open Access Publishing (http://www.earlham.edu/~peters/fos/bethesda.htm) (April 11, 2003) • The Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities (http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html). (2003) • PubMedCentra (NLM 2004)
Quality of Information • A lack of quality control • Anyone can publish online • A wealthy of studies concluded that Web has a poor quality for healthcare information • Readability • Hard to read
Information Needs and Seeking • Unrecognized needs • Clinicians unaware of information needs or knowledge deficit • Recognized needs • Clinicians aware of needs but may or may not pursue them • Pursued needs • Information seeking occurs but may or may not be successful • Satisfied needs • Information seeking successful
What You Will Learn • IR algorithms • Indexing • Query and Retrieval • Evaluation • Text Classification • XML retrieval • Web retrieval
What You Will Learn (Cont.) • Open-Source IR tools • What open-source IR tools are available • Indexing/retrieval • Part-of-speech and syntactic parsing • Semantic parsing • Discourse relations • Machine-learning classifiers • How to use the tools?
What You Will Learn (Cont.) • State of the art IR systems • Baruch 1965 [BLIMP http://blimp.cs.queensu.ca/index.html] • SAPHIRE (Hersh 1990) • Retrieval • MedLEE (Friedman 1994) • Extraction • PubMed (NLM 1997) • ARROSMITH Systems (Smalheiser 1998) • Hidden Relation Discovery Tool • GENIES (Friedman 2001) • Extraction
BioNLP Systems • BioText (Hearst 2003http://biotext.berkeley.edu/) • Retrieval+Categorization • GeneWays (Rzhetsky 2004 http://geneways.genomecenter.columbia.edu/) • Extraction+Visualization • TextPresso (Muller 2004http://www.textpresso.org/) • Retrieval+Extraction • iHOP (Hoffman and Valencia 2005http://www.ihop-net.org/UniPub/iHOP/) • Retrieval • BioMedQA (Yu 2006 http://monkey.ims.uwm.edu/MedQA) • Question Answering
Beyond text: Image and Video • Image classification • Finding concepts in captions and annotations • Machine learning on textual & visual features • Determining salient features in text and image separately and merging the results • Extracting text from image • Understanding and correcting OCR (handwriting, equations) • Finding text in images • Finding document text related to illustrations • Video retrieval
Resources • Annotated collections (GENIA, Medstract, Yapex …) • Ontologies, tools, knowledge bases … • Publications, Conferences, Evaluations … • Centres and web portals
What We Provide • Textbook • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Introduction to Information Retrieval. Cambridge University Press, 2007 • http://www-csli.stanford.edu/~schuetze/information-retrieval-book.html • Office hour: • Tuesdays, 3-4 pm EMS 710 and by appointment • Hong Yu, 414-229-3344 • Susan McRoy, 414-229-6695
What We Expect • Undergraduate: • 30% Homework, 35% Midterm exam, 35% Final exam or project • Graduate: • 20% Midterm exam, 40% Homework, 40% Project: The project may be done individually or in a team of 2-3 people. The final project will include a software system, a 2-3 page written project report, and an oral presentation. The report should describe the problem, the approach, and evaluation and should cite related work where appropriate.