1 / 62

Ariadne Genomics technology: Extraction from the literature and network analysis Dr. Anton Yuryev

Ariadne Genomics technology: Extraction from the literature and network analysis Dr. Anton Yuryev Ariadne Genomics Inc. Pathway Studio desktop Pathway Studio workgroup Pathway Studio enterprise Main functionality: Data mining and pathway building Analysis of high-throughput data

vinson
Download Presentation

Ariadne Genomics technology: Extraction from the literature and network analysis Dr. Anton Yuryev

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ariadne Genomics technology: Extraction from the literature and network analysis Dr. Anton Yuryev Ariadne Genomics Inc.

  2. Pathway Studio desktop Pathway Studio workgroup Pathway Studio enterprise Main functionality: Data mining and pathway building Analysis of high-throughput data Text-mining, fact extraction and database building Pathway Studio product line

  3. Ariadne Corporate OfferingSoftware solution for Knowledge management and pathway analysis of the high-throughput data MedScan 1000 abstracts/min Knowledge Databases Pathway Building Pathway collection Proprietary data ResNet Biological Association Networks Public interaction data Analysis of High-Throughput data Text-mining

  4. Automatic fact extraction by MedScan from organism-specific subset of PubMed and full-text journals Import of Ariadne proprietary curated data Curated physical interaction 712signaling line pathways Import of publicly available curated interaction data: Entrez Gene, BIND, HPRD, KEGG, Gene Ontology Import of publicly available high-throughput interaction data (Y2K, Mass-spec etc) Import of user proprietary data: Proprietary or publicly available experimental data in PSI, BioPax or Tab-delimited formats Data mined by MedScan tool from literature sources not included with database User manual curation Ariadne Database Construction

  5. > 130 KEGG metabolic pathways >70 STKE pathways (AAAS) >10,000 ERGO pathways for 587 organisms (Integrated genomics) >100,000 protein interactions from Hynet (Prolexys) >600 disease pathways PathArt (Jubilant) Additional Commercial datasets

  6. Web-client for instant pathway publishing Connection between multiple geographical sites 3-tier architecture with Java API to connect third party applications and algorithms MedScan Enterprise license: open MedScan dictionaries and pattern rules files for customization distribution of MedScan data across entire company GSEA, NEA and network clustering algorithms for analysis of high-throughout data Pathway Studio Enterprise distinctions

  7. Pathway Studio Enterprise Architecture Read-only users via web browser Application server Database Data editors via web browser Third party tools, in-house applications, API SQL interface, bulk data management Bioinformaticians via Pathway Studio

  8. “Everyone is an Expert” decentralized deployment schema Hundreds or thousands of users some with read only and some with editor or publishers roles accessing one central database via Pathway Studio and/or Web browser to analyze experiments, browse pathway collection, do literature mining, sharing the data and analysis results.

  9. “Bioinformatics service group” centralized deployment schema Bioinformatics group servicing scientists for entire company by analyzing their experimental data and literature mining. Analysis results are published via Web browser interface for end users End users View only access to pathways and analysis networks annotated with experimental data via web browser and links to PathwayExpert Web Services • Experimental data • Search requests • Analysis of experimental data • Text-mining and Pathway Building Bioinformatics group

  10. “Disease area” decentralized clusters deployment schema Disease area groups have bioinformatics, biologists and chemists working as a team with focus on one disease Cardiovascular group Cancer group Digestive disorders group CNS group

  11. Text-mining, fact extraction and database building Stay current with the literature Build focused literature networks Build focus databases Data mining and pathway building Understand molecular mechanisms of disease and processes Maintain pathway collection Build focus databases Analysis of high-throughput data Functional ontology analysis Network analysis Plan of the talk

  12. Introduction to MedScan technology

  13. Sentence in PubMed: “Axin binds beta-catenin and inhibits GSK-3beta.” Identify Proteins in Dictionary (in red): “Axin binds beta-cateninand inhibits GSK-3beta.” Identify Interaction Type (in black): “Axinbindsbeta-cateninand inhibitsGSK-3beta.” Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect: Negative How MedScan extracts facts from text?

  14. Filtering by Number of references controls the network confidence in Pathway Studio Binding (references: 77) Owner: public, Entities E2F1-RB1 This stabilization of the pRB-E2F-1 complex by AAV expression in adenoviral-infected cells should lead to a decrease in E2F-1- mediated expression of cell cycle-specific genes.

  15. MedScan Architecture Customizable by user Modules Entity recognizer Entity detection Dictionaries Toxicology Drosophila Mammals C-elegans Yeast Plants RNEF XML Semantic processor Rules Relationship extraction Pattern matcher Patterns Cartridges • Future: • New modules: ConceptScan • New cartridges: Immunology, Clinical

  16. Manually curated: dictionaries and grammar rules Fast: 14 mln PubMed abstracts in 2 days on modern PC Comprehensive: facts recovery rate > 90% 90% = 70% sentence recovery rate + 20% literature redundancy Removes redundancy:7,647,282 non-distinct relations =>1,000,000distinct relations Accurate: false positive rate – 10% Customizable: dictionaries and patterns Describing MedScan

  17. MedScan Applications Indexing the scientific literature Pubmed Entity-based indexSemantic Index Google MedScan Open access Extracting interactions to create databases for systems biology Automatic reader’s digest Document Summary

  18. Manual Automatic using Graph navigation tools Using text-mining with MedScan Pathway Building in Pathway Studio

  19. Viewing entities in the List Pane Entity and relation tables Show all references Pathway Reference summary Export protein list Display styles: By type, By effect, By reference count UI options: magnifier fit text to entities simple and full graph view fit to window rotate move zoom by rectangle advanced graph scaling resizing nodes in pathway pane Viewing and editing pathways in Pathway Studio

  20. Pathway Building by text-mining Non-melanoma skin cancer >1,000,000 cases, (<2,000 deaths), in USA

  21. MedScan Reader: PubMed search Keep searching and adding relations At the end Send extracted relations to Pathway Studio

  22. MedScan Reader: Import top 100 Hits from Google Scholar search: downloads found articles and processes them with MedScan

  23. MedScan Reader: Import top 30 Hits from Google search: downloads found web-pages and processes them with MedScan

  24. Full-text article found on Highwire press with “non-melanoma skin cancer” text search

  25. MedScan customization by focused literature source:“Nonmelanoma skin cancer” literature network – result of targeted text-mining by MedScan Reader • Every entity in this network was mentioned in the context of non-melanoma skin cancer: • Find hubs • Compare with patient data

  26. MedScan customization by focused literature source:Protein network for non-melanoma skin cancer Compare this pathway with your experimental patient data

  27. Automatic Pathway Building using Graph navigation Build pathway tool

  28. Basic principal Regulatory interactions are mediated by physical interaction network Regulomes Biological processes pathways Disease networks Mining regulatory relations in database

  29. Regulome pathways: algorithm input

  30. Regulome pathways: Connecting IL10 targets with physical interaction relations

  31. Building pathways by Data miningconverting regulatory network to protein physical interaction network for Cell Processes, Diseases, Regulomes

  32. Disease networks2300 diseases, 230 cancers in ResNet 5.0 databaseconverting regulatory network to protein physical interaction network for Diseases Endothelial cells cancer

  33. Endothelial cells cancer network

  34. Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in Complex Regional Pain SyndromeJ Biomed Discov Collab. 2007; 2: 2. Kristina M Hettne, Marissa de Mos, Anke GJ de Bruijn, Marc Weeber, Scott Boyer, Erik M van Mulligen, Montserrat Cases, Jordi Mestres, and Johan van der Lei Resulting network of CRPS concepts

  35. High-throughput data analysis in Pathway Studio • Identification of responsive genes • Functional ontology analysis • Network analysis

  36. Gene expression Metabolomics Proteomics SNP and CNV analysis Methylation arrays Phosphorylation arrays Supports analysis of all types of experiment data • Support for all microarray platforms: • Affymetrix • Agilent • Illumina • Nimblegen • Superarray • Custom design chips

  37. Expression data import (tab, xls, cel) Selection of responsive genes Find differentially expressed genes (significance analysis via t-test) Gene clustering via correlation networks Find responsive genes in the third party software for statistical analysis of microarray data and import it as a protein list (Tools->Import protein list) Analysis of gene expression microarray data: STEP 1: Identification of responsive genes

  38. Calculation of differentially expressed genes in Pathway Studio (significance analysis using paired and unpaired t-tests)

  39. Gene clustering in Pathway Studio using Correlation network

  40. Network analysis Identification of DE expressed protein complexes and physical networks Identification of major regulators and targets in expression network Via network querying (Build pathway tool) Via Network enrichment analysis (in PS Enterprise only) Functional analysis Comparison of responsive genes with ontologies and pathway collection Via Fisher exact test Via Gene Set Enrichment analysis (GSEA in PS Enterprise only) Gene ontology analysis (via Fisher’s test or GSEA) Comparative gene ontology analysis Via network querying(Build pathway tool) Analysis of gene expression microarray data: STEP 2: Pathway Analysis of responsive genes

  41. Functional analysis: comparative GO groups analysiscomparing cell responses in GO group space

  42. Building protein network from interesting GO groups and identification of its major expression regulator

  43. Identification drug responsive genes

  44. Evaluation of drug efficacy and side-effects

  45. GSEA: Gene Set Enrichment analysis in PS Enterprise

  46. Visualizing expression data on GSEA pathway

  47. High-throughput data analysis in Pathway Studio • Functional ontology analysis • Network analysis

  48. Expression Interpretation of Gene Expression data PromoterBinding DirectRegulation Interpretation of Proteomics data ProtModification Binding Interpretation of Metabolomics data, Biomarkers prediction and validation MolSynthesis MolTransport Regulation Data model in ResNet databaseFormalized representation of biological regulatory and interaction network …MORE….

  49. Network analysis: identification of major regulators and targets among DE genes via Build pathway

  50. Network analysis: Identification of major regulatorsNetwork enrichment analysis Finds regulators with most differentially expressed targets Better Worse

More Related