170 likes | 254 Views
PSB 2006 Maui Jan 2-7. PSB 2006 post-meeting website: http://psb.stanford.edu/psb06/ PSB Proceedings online: http://helix-web.stanford.edu/psb06/ Current & future focus: Applications of computational tools to clinically important problems - also, integration of computation & experiment.
E N D
PSB 2006 Maui Jan 2-7 • PSB 2006 post-meeting website: http://psb.stanford.edu/psb06/ • PSB Proceedings online: http://helix-web.stanford.edu/psb06/ • Current & future focus: Applications of computational tools to clinically important problems - also, integration of computation & experiment
Keynote: Michael Ashburner (Cambridge)Famous Drosophila geneticist - founder of GO Ontologies for Biologists: A community model for annotation of genomic data http://www.geneontology.org/ • GO began at ISMB 1998: consortium of 3 founding databases FlyBase Saccharomyces Mouse • Co-founders of GO & OBO: S Lewis, J Blake, M Cherry
GO today? Includes all of major model organism databases & significant multi-organism databases, e.g. GeneDB, UniProt, TIGR STILL: lacking human analog of model organism databases... The most serious annotation of human genes is done by UniProt at EBI Goal: to provide structure controlled vocabularies for representation of biological knowledge in biological databases
Content of GO today? 19,461 terms total In future: Plan to add specified relationships between concepts in different ontologies (when?) Will be fairly substantial change in architecture of GO Current Architecture? Tree vs directed acyclic graph – tree not rich enough for GO Currently, DAG (any concept can have more than one parent) Parent-child relationships can be only: ISA (hypernomy/hyponomy) PARTOF (meronomy/holonomy) – not the same across the hierarchy
Sanity checks? Use annotations of orthologous genes to verify GO annotations Updates: Monthly, or so – a table of triples (actually ~ 11 attributes) GO Gene Association Tables – now available for many organisms (but not E. coli, hope to change this soon) GO database now at Stanford with Mike Cherry Curated GO Annotations: (from core of about 25 organisms) 2006? about 0.5 million human expert curated gene products plus 1.5 million products automatically annotated by UniProt GO provides database for other browsers – GO browser AmiGO Entire GO can be downloaded from FTP site
GO as community project: Anyone can suggest changes to GO - & GO content Geneontology.sourceforge.net OBO (formerly EGO=extended GO) Open Biology Ontologies http://obo.sourceforge.net/ CBS site at sourceforge – setup as umbrella for collecting datat OBO will change radically in next few months & take over OBO (National Center for Biomedical Ontology) due to cBio funded by NIH as National Center (Berkeley & Stanford) http://bioontology.org/ cBio will have: OBO OBD (Open Biomedical Data) BioPortal
Aims of Sequence Ontology (SO) Develop shared set of terms & concepts to annotate biological sequence Apply this to separate projects to provide consistent query capabilities between them Provide software environment resource to assist in application & distribution of SO Wanted to enrich GenBank - something that would allow computation: • What fraction of genes in Drosophila have alternatively spliced products? • What fraction of genes in worm are? This can't be answered from GenBank feature table SO has two parts: 1- Features that can be located on a sequence with coordinates 2- Properties of these features sequence attributes consequences of mutations chromosome variation
Summary – Recent developments: Make maintenance of GO in future more manageable (& scalable) Make GO more computable Integrates ontologies Explores new paradigms: OBO-edit allows one to edit & instantiate cross- products by hand or computationally Includes visualization of hierarchical relationships
Important for future: SO & Phenotype annotation Very hard, classifically done in free text Subproject in cBio = attempt to annotate rich set of genes in Fly, Worm, Human, using attribute valued triplet (by M Westerfield & Ashburner) Entity, attribute, value (showed example of comparing human vs zebrafish) -- has great potential for cross-species learning Ashburner: "must view much of current 'comparative genomics' with suspicion until SO and phenotype annotation improved - must be done before truly meaningful comparative genomics can be done!"
Linking Biomedical Information Through Text Mining K.Bretonnel Cohen, Olivier Bodenreider, and Lynette Hirschman; Pacific Symposium on Biocomputing 11:1-3(2006) Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning Hong-Woo Chun, Yoshimasa Tsuruoka, Jin-Dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki, and Jun'ichi Tsujii; Pacific Symposium on Biocomputing 11:4-15(2006) Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence Data Annette Hoglund, Torsten Blum, Scott Brady, Pierre Donnes, John San Miguel, Matthew Rocheford, Oliver Kohlbacher, and Hagit Shatkay; Pacific Symposium on Biocomputing 11:16-27(2006) Evaluation of Lexical Methods for Detecting Relationships Between Concepts from Multiple Ontologies Helen L. Johnson, K. Bretonnel Cohen, William A. Baumgartner Jr., Zhiyong Lu, Michael Bada, Todd Kester, Hyunmin Kim, and Lawrence Hunter; Pacific Symposium on Biocomputing 11:28-39(2006) Automatically Generating Gene Summaries from Biomedical Literature Xu Ling, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, and Bruce Schatz; Pacific Symposium on Biocomputing 11:40-51(2006) Finding GeneRIFs via Gene Ontology Annotations Zhiyong Lu, K. Bretonnel Cohen, and Lawrence Hunter; Pacific Symposium on Biocomputing 11:52-63(2006) PhenoGO: Assigning Phenotypic Context to Gene Ontology Annotations with Natural Language Processing Yves Lussier, Tara Borlawsky, Daniel Rappaport, Yang Liu, and Carol Friedman; Pacific Symposium on Biocomputing 11:64-75(2006) Large-Scale Testing of Bibliome Informatics Using Pfam Protein Families Ana G. Maguitman, Andreas Rechtsteiner, Karin Verspoor, Charlie E. Strauss, and Luis M. Rocha; Pacific Symposium on Biocomputing 11:76-87(2006) Predicting Gene Functions from Text Using a Cross-Species Approach Emilia Stoica and Marti Hearst; Pacific Symposium on Biocomputing 11:88-99(2006) Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles Andreas Vlachos, Caroline Gasperin, Ian Lewin, Ted Briscoe; Pacific Symposium on Biocomputing 11:100-111(2006)
Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence DataAnnette Hoglund, Torsten Blum, Scott Brady, Pierre Donnes, John San Miguel, Matthew Rocheford, Oliver Kohlbacher, & Hagit Shatkay PSB 11:16-27(2006) • Nice talk • Combine 5 separate classifiers: 4 sequence-based (3SVMs & 1 motif-search) & one text-based (rep protein as vector of weighted text features) • Text-based - assign set of PubMed abstracts, based on Swiss-Prot (so this requires prev. annotation in Swiss-Prot) • Table should be useful for comparison with Carson's results • Report Acc, Sens, Spec & MCC for animal & plant datasets • Cf Acc? Target P 85%, MultiLoc 75%, PLOC 78% • Web Server: MultiLoc/TargetLoc • http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc
Predicting Gene Functions from Text Using a Cross-Species ApproachEmilia Stoica and Marti Hearst - UC BerkeleyPSB 11:88-99(2006) Use orthologous gene information in 2 ways: • CSM Cross species match algorithm - using GO codes of orthologous genes to generate a functional annotation • CSC Cross species correlation algorithm - uses all GO codes & then eliminates "illogical" ones Final Annotation is computed as union (?) of two sets CSM & CSC Test algorithm on dataset of Task 2.2 of BiocreAtive competition, on EBI human & MGI - claim better than other solutions Report F measure - harmonic mean of precision & recall http://biotext.berkeley.edu/
Semantic Webs for Life Sciences Roberts Stevens, Olivier Bodenreider, and Yves A. Lussier; Pacific Symposium on Biocomputing 11:112-115(2006) Selecting Biological Data Sources and Tools with XPR, a Path Language for RDF Sarah Cohen-Boulakia, Christine Froidevaux, and Emmanuel Pietriga; Pacific Symposium on Biocomputing 11:116-127(2006) Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development Benjamin M. Good, Erin M. Tranfield, Poh C. Tan, Marlene Shehata, Gurpreet K. Singhera, John Gosselink, Elena B. Okon, and Mark D. Wilkinson; Pacific Symposium on Biocomputing 11:128-139(2006) Putting Semantics into the Semantic Web: How Well Can It Capture Biology? Toni Kazic; Pacific Symposium on Biocomputing 11:140-151(2006) Event Ontology: A Pathway-Centric Ontology for Biological Processes Tatsuya Kushida, Toshihisa Takagi, and Ken Ichiro Fukuda; Pacific Symposium on Biocomputing 11:152-163(2006) Discovering Biomedical Relations Utilizing the World-Wide Web Sougata Mukherjea and Saurav Sahay; Pacific Symposium on Biocomputing 11:164-175(2006) Biodash: A Semantic Web Dashboard for Drug Development Eric K. Neumann and Dennis Quan; Pacific Symposium on Biocomputing 11:176-187(2006) SemBiosphere: A Semantic Web Approach to Recommending Microarray Clustering Services Kevin Y. Yip, Peishen Qi, Martin Schultz, David W. Cheung, and Kei-Hoi Cheung; Pacific Symposium on Biocomputing 11:188-199(2006) Experience in Reasoning with the Foundational Model of Anatomy in OWL DL Songmao Zhang, Olivier Bodenreider, and Christine Golbreich; Pacific Symposium on Biocomputing 11:200-211(2006)
Computational Proteomics Session Introduction Bobbie-Jo Webb-Robertson, William Cannon, Joshua Adkins, and Deborah Gracio; Pacific Symposium on Biocomputing 11:212-218(2006) A Machine Learning Approach to Predicting Peptide Fragmentation Spectra Randy J. Arnold, Narmada Jayasankar, Divya Aggarwal, Haixu Tang, and Predrag Radivojac; Pacific Symposium on Biocomputing 11:219-230(2006) Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an Infinite Latent Feature Model Wei Chu, Zoubin Ghahramani, Roland Krause, and David L. Wild; Pacific Symposium on Biocomputing 11:231-242(2006) High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques Eva Lange, Clemens Gršpl, Knut Reinert, Oliver Kohlbacher, and Andreas Hildebrandt; Pacific Symposium on Biocomputing 11:243-254(2006) Fast De novo Peptide Sequencing and Spectral Alignment via Tree Decomposition Chunmei Liu, Yinglei Song, Bo Yan, Ying Xu, and Liming Cai; Pacific Symposium on Biocomputing 11:255-266(2006) Experimental Design of Time Series Data for Learning from Dynamic Bayesian Networks David Page and Irene M. Ong; Pacific Symposium on Biocomputing 11:267-278(2006)
Finding Diagnostic Biomarkers in Proteomic Spectra Pallavi N. Pratapa, Edward F. Patz, Jr., Alexander J. Hartemink; Pacific Symposium on Biocomputing 11:279-290(2006) Gaussian Mixture Modeling of Helix Subclasses: Structure and Sequence Variations Ashish V. Tendulkar, Babatunde Ogunnaike, and Pramod P. Wangikar; Pacific Symposium on Biocomputing 11:291-302(2006) An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem Mass Spectrometry Haipeng Wang, Yan Fu, Ruixiang Sun, Simin He, Rong Zeng, and Wen Gao; Pacific Symposium on Biocomputing 11:303-314(2006) Normalization Regarding Non-Random Missing Values in High-Throughput Mass Spectrometry Data Pei Wang, Hua Tang, Heidi Zhang, Jeffrey Whiteaker, Amanda G. Paulovich, and Martin Mcintosh ; Pacific Symposium on Biocomputing 11:315-326(2006) A Point-Process Model for Rapid Identification of Post-Translational Modification Bo Yan, Tong Zhou, Peng Wang, Zhijie Liu, Vincent A. Emanuele II, Victor Olman, and Ying Xu; Pacific Symposium on Biocomputing 11:327-338(2006) A New Approach for Alignment of Multiple Proteins Xu Zhang and Tamer Kahveci; Pacific Symposium on Biocomputing 11:339-350(2006)
A Point-Process Model for Rapid Identification of Post-Translational ModificationsBo Yan, Tong Zhou, Peng Wang, Zhijie Liu, Vincent A. Emanuele II, Victor Olman, and Ying Xu - UGa & Ga TechPSB 11:327-338(2006) • Seems to be a good approach for estimating & identifying PSMs • Usual approach is exhaustive search • Point-process model that finds optimal mass shifts to maximize alignment between experimental MS/MS spectrum & candidate theoretical spectrum, through cross-correlation calculation • Gives rapid seasrch in "blind" mode - without giving types of PTMs in advance • Comparable to other blind approaches but more efficient & simpler • http://csbl.bmb.uga.edu • http://csbl.bmb.uga.edu/resources.html