70 likes | 167 Views
The BioText Project: Recent Work. Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech. Project Team. Project Leaders: PI: Marti Hearst Co-PI: Adam Arkin Computational Linguistics Preslav Nakov Emilia Stoica Sarah Poon
E N D
The BioText Project:Recent Work Marti Hearst SIMS, UC Berkeley http://biotext.berkeley.edu Supported by NSF DBI-0317510 and a gift from Genentech
Project Team • Project Leaders: • PI: Marti Hearst • Co-PI: Adam Arkin • Computational Linguistics • Preslav Nakov • Emilia Stoica • Sarah Poon • IR/Databases/Software • Ariel Schwartz • Itai Brickner • Brian Wolf • Bioscience • Janice Hamer • Alumni • Dr. Barbara Rosario • Dr. TingTing Zhang • Gaurav Bhalotia
BioText Project Goals • Provide flexible, intelligent access to information for use in biosciences applications. • Focus on • Textual Information from Journal Articles • Tightly integrated with other resources • Ontologies • Record-based databases
BioText Architecture Sophisticated Text Analysis Annotations in Database Improved Search Interface
Today’s Talks • Intro (Marti) • Design and Implementation of the Layered Query Language (Ariel & Brian) • Adding Fulltext to LQL (Itai) • Determining Gene Function from Text (Emilia) • Using the Web as an Implicit Training Corpus (Presley) • Identifing Protein-Protein Interactions (Marti, covering Barbara’s work) • Citances (Marti) • Discussion: what should our user interface do?
Recent Papers • Predicting Gene Functions from Text Using a Cross-Species Approach, Emilia Stoica and Marti Hearst, to appear in PSB 2006. • Multi-way Relation Classification: Application to Protein-Protein Interaction, Barbara Rosario and Marti Hearst, in HLT/EMNLP 2005. • Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution, Preslav Nakov and Marti Hearst, in HLT/EMNLP 2005.
Recent Papers • Scaling Up BioNLP: Application of a Text Annotation Architecture to Noun Compound Bracketing, Preslav Nakov, Ariel Schwartz, Brian Wolf, and Marti Hearst, in ACL/ISMB SIGLINK 2005. • Search Engine Statistics Beyond the n-gram: Application to Noun Compound Bracketing , Preslav Nakov and Marti Hearst, in CoNNL 2005. • Citances: Citation Sentences for Semantic Analysis of Bioscience Text, Preslav Nakov, Ariel Schwartz, and Marti Hearst, in the SIGIR'04 workshop on Search and Discovery in Bioinformatics.