180 likes | 334 Views
Please have a seat. Our program will commence shortly. B iomarker A utomated R etrieval T ool. K N. R C. Ronny Chan, Kim Ngo Earth Science Data Systems Dept. Bioinformatics Relationship. Science produces massive amounts of data Data needs to be analyzed, stored, & retrieved
E N D
Biomarker Automated Retrieval Tool K N R C Ronny Chan, Kim Ngo Earth Science Data Systems Dept.
Bioinformatics Relationship • Science produces massive amounts of data • Data needs to be analyzed, stored, & retrieved This is data-mining • We want to apply computer science to improve this process
Motivation • Problems with conventional data mining • Time consuming • Accuracy not defined (subjective) • No objective scientific info retrieval tool Where are the Biomarkers?
Cancer Biomarkers An indicator of cancerous growth. BIO +
Proposed Solution Create a program that allows people to quickly scan literature for the most relevant keywords/biomarkers BAG-1 ERBB2 B.A.R.T. HER-2 EP-CAM HPEBP4
Significance • What is the need of the project? • More efficient research • Save time B.A.R.T. conventional enhanced
Goals • Make biomarker/keyword searches more efficient • Learn Java • Learn SQL
Approach • Write a program • Read in articles • Use part of Vector Space Model algorithm to rank terms • Output relevant terms in statistical rankings BRCA1 they VS.
Information Retrieval System Introduced by Gerald Salton in the 60’s. Used widely in different search engines Vector Space Model
Algorithm for B.A.R.T. Keywords Input PubMed Query Agent Keyword Parser Content Analyzer Content Ranker Data Store Data Retrieval and Output
Results • DCIS • CU-TP3982 • ERBB2 • HER-2 • HPEBP4 • BAG-1 • EP-CAM • 99M
Lessons & Difficulties • Deciding on algorithm choice • Ease of implementation and effectiveness • Limited knowledge & experience • Java, SQL • Initial implementation is slow 5 ARTICLES = 160 sec 20 ARTICLES = 1904 sec 100 ARTICLES = 8^38 years UPDATE: AUGUST 18, 2004 100 ARTICLES = 8^19 years
Future work • Apply different term weight functions to make results more robust • Optimize the program for speed
Citations • http://ir.iit.edu/~dagr/cs529/files/handouts/03VectorSpaceImplementation-6per.PDF • http://classes.engr.oregonstate.edu/eecs/spring2004/cs419/10 • http://www.cs.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf • http://hartford.lti.cs.cmu.edu/classes/95-778/Lectures/04-BooleanVectorSpaceB.pdf • Biomarkers Definitions Working Group. Biomarkers and surrogate endoints: preferred definitions and conceptual framework. Clin. Pharmacol. Ther. 69(3), 89-95 (2001).
Acknowledgements National Science Foundation (NSF) National Institute of Health (NIH) Earth Science Data System, JPL Tina Xiao Paul Ramirez Chris Mattmann Roshanak Roshandel Sean Hardman Southern California Bioinformatics Summer Institute (So Cal BSI) SoCalBSI Professors Jacqueline Heras ALL SoCalBSI Colleagues
VSM Example Q : malignant breast cancer D 1: detection of malignant level in the cell D 2: sighting of breast stage in the breast cancer D 3: detection of malignant stage in the cancer
Example Continued… Keyword tf * idf