1.2k likes | 1.42k Views
Pathway Studio Workgroup/Enterprise training course. DAY 1 Technology overview System architecture. Pathway Studio Desktop Pathway Studio Workgroup Pathway Studio Enterprise Main functionality: Data mining and pathway building Analysis of high-throughput data
E N D
Pathway Studio Desktop Pathway Studio Workgroup Pathway Studio Enterprise Main functionality: Data mining and pathway building Analysis of high-throughput data Text-mining and fact extraction Products
Ariadne Corporate OfferingSoftware solution for Knowledge management and pathway analysis of the high-throughput data MedScan 1000 abstracts/min Knowledge Databases Pathway Building Pathway collection Proprietary data ResNet Biological Association Networks Public interaction data Analysis of High-Throughput data Text-mining
188 publications using AGI software and ResNet database Gene expression microarray analysis (105) Pathway Analysis (80) Disease mechanism (64) Human genetics (7) Publication by Ariadne Authors (13) Text processing (9) Reviews (6) Databases (3) Drug discovery (16) Toxicogenomics (4) Accomplishments (April, 2007)
Pathway Studio Workgroup client-server architecture Read-only users Database Data curators Third party tools, in-house applications, API SQL interface, bulk data management PSW administrator
PathwayExpert Architecture Read-only users via web browser Application server Database Data editors via web browser Third party tools, in-house applications, API SQL interface, bulk data management Bioinformaticians via Pathway Studio
“Everyone is an Expert” decentralized deployment schema Hundreds or thousands of users some with read only and some with editor or publishers roles accessing one central database via Pathway Studio and/or Web browser to analyze experiments, browse pathway collection, do literature mining, sharing the data and analysis results.
“Bioinformatics service group” centralized deployment schema Bioinformatics group servicing scientists for entire company by analyzing their experimental data and literature mining. Analysis results are published via Web browser interface for end users End users View only access to pathways and analysis networks annotated with experimental data via web browser and links to PathwayExpert Web Services • Experimental data • Search requests • Analysis of experimental data • Text-mining and Pathway Building Bioinformatics group
“Disease area” decentralized clusters deployment schema Disease area groups have bioinformatics, biologists and chemists working as a team with focus on one disease Cardiovascular group Cancer group Digestive disorders group CNS group
Ariadne MedScan Text-To-Knowledge TechnologyExtracting biological association networks from text MedScan 1000 abstracts/min MedScan output: RNEF XML Pathway Studio to navigate knowledgebase Knowledge Databases ResNet Biological Association Networks Pathway Analysis in ResNet database
Sentence in PubMed: “Axin binds beta-catenin and inhibits GSK-3beta.” Identify Proteins in Dictionary (in red): “Axin binds beta-cateninand inhibits GSK-3beta.” Identify Interaction Type (in black): “Axinbindsbeta-cateninand inhibitsGSK-3beta.” Extracted Facts: Axin - beta-catenin relation: Binding Axin -> GSK-3beta relation: Regulation, effect: Negative How MedScan extracts facts from text?
Manually curated: dictionaries and grammar rules Fast: 14 mln PubMed abstracts in 2 days on modern PC Comprehensive: facts recovery rate > 90% Removes redundancy:7,647,282 non-distinct relations =>1,000,000distinct relations Accurate: false positive rate – 10% Customizable: dictionaries and patterns Describing MedScan
MedScan Architecture Customizable by user Modules Entity recognizer Entity detection Dictionaries Toxicology Drosophila Mammals C-elegans Yeast Plants RNEF XML Semantic processor Rules Relationship extraction Pattern matcher Patterns Cartridges • Future: • New modules: ConceptScan • New cartridges: Immunology, Clinical
Overview of MedScan Architecture Input Text Protein names dictionary Dictionary-based Identifies proteins and small molecules Preprocessor Tagged Sentences Lexicon Tokenizer Context-free grammar Grammar and lexicon are proprietary. They are domain-independent by design but focused on biomedical field. Sequence of Words Pattern Matcher Syntactic Parser Grammar Sentence Structure Semantic Interpreter Semantic tree Rule-based Rules are equivalentto ontology Extraction rules Ontological interpreter Converter Database of relations Extraction patterns Extracted facts
MedScan Applications Indexing the scientific literature Pubmed Entity-based indexSemantic Index Google MedScan Open access Extracting interactions to create databases for systems biology Automatic reader’s digest Document Summary
Tools -> Start MedScan Reader Web-browser enhanced with MedScan technology Search PubMed and manually select abstracts for fact extraction Search Google Scholar and extract facts from top 100 hits Search Google and extract facts from top 30 hits Search Highwire and BioMed Central and extract facts from the individual full-text articles Tools -> MedScan: Extract pathways from text search PubMed from file from location Tools -> Update pathway Tools -> Pathway Reference summary Export to EndNote Text-mining tools in Pathway Studio
1) Specifying MedScan cartridge Medscan Reader settings 2) Tracking favorite entities via highlight 3) Filtering for favorite entities and relations 4) Filtering against entities and relations
Shipped with >1,000,000 unique relations derived by Medscan between proteins, metabolites, chemicals, cell processes and diseases ResNet physical interactions are manually curated 712 manually curated pathways Gene Ontology Optional pathway updates: >300 Regulome pathways >2500 Biological processes pathways >200 Cellular component pathways High-throughput interaction data ResNet automatically curation is possible to remove redundancy and cleanup false positives ResNet Mammal Database
Canonical pathways (included, curated) Signaling line pathways (included, curated) Regulome pathways (optional, automatic) Biological processes pathways (optional, automatic) Cellular component pathways (optional, automatic) KEGG metabolic pathways (optional, imported) STKE (commercial) Metabolic vision (commercial) PathArt (commercial) Pathways collection in ResNet
All databases contain: - Relations extracted by MedScan organism-specific cartridge from organism-specific abstracts and full-text articles Entrez Gene protein annotation Protein interactions from Entrez Gene (include BIND, HPRD, BioGRID and EcoCyc datasets) Gene Ontology annotation Model Organism databases: ResNet Plant >400,000 relations, supports 6 plant species Optional entity co-occurrence data Additional protein physical interactions predicted by TAIR ResNet Drosophila Additional interactions from published high-throughput datasets ResNet C-elegans Additional interactions from published high-throughput datasets ResNet Yeast Additional interactions from published high-throughput datasets ResNet Bacteria (beta version) Additional interactions from published high-throughput datasets Databases for non-model organisms containing interactions predicted from closest model organism are available from: http://www.ariadnegenomics.com/support/downloads/databases/ Ariadne databases for other organisms
KEGG: > 130 metabolic pathways from Kyoto U-ty STKE: > 70 pathways from AAAS Metabolic vision: >10,000 curated pathways for 587 organisms from Integrated Genomics Inc Hynet: adds over 100,000 new protein physical interactions to ResNet 5.0 from Prolexys Inc PathArt: >600 disease pathways from Jubilant Inc Additional Commercial Datasets
Day1Pathway Studio maintenance and administration and technical support
Pathway Studio desktop or workgroup client CPU: 2 GHz or more RAM: 512 MB or more Disk space for application: 500 MB Disk space for one local database: 2 GB PathwayStudio workgroup server 1 CPU for 1-5 concurrent users: : >3.0 GHz 2 CPU for 6-10 concurrent users: >3.0 GHz RAM for 1-5 concurrent users: >2 GB RAM for 6-10 concurrent users >3 GB Disk space : 20 GB for the database Optimal disk configuration: for 1-5 concurrent users: 4 hard drives in RAID 0 for 6-10 concurrent users: RAID 10 mode Hardware requirements for Pathway Studio
Pathway Studio desktop or workgroup client Microsoft Windows Server (2000,2003), Windows XP (Professional), Windows Vista (Professional, Ultimate, Corporate) PathwayStudio workgroup server MS SQL Server 2000 or 2005 (Developer, Workgroup, Standard or Enterprise Edition) on Windows 2000, Windows 2003 Server, Windows XP Professional Oracle 10g or later on any supported Oracle platform including Windows 2003 Server, Linux, etc. Pathway Studio software requirements
Database statistics Viewing entities in the list pane Viewing pathways Viewing groups Expression experiments folder Simulation model folder Database Index folder
Administrator Editor – can edit public objects Publisher – can publish private pathways Regular user – can work only in his private space Ask your PSW administrator to get an account and choose your role PS Workgroup Admin consoleUser roles in Workgroup environment
Ariadne Technical Support http://www.ariadnegenomics.com/products/support.html
Medscan technology Software architecture, hardware and software requirements User roles ResNet database overview Ariadne’s technical support Summary of the introduction slides
Working with objects in database Working with pathway diagram and layout algorithms Database search in PS Build pathway tool and strategy Data import/export Pathways in ResNet Pathway comparison and statistical algorithms Find groups/pathways Text-mining in PS Microarray analysis: data import options and algorithms Pathway kinetics simulation in PS Summary for the rest of the day
Manual Automatic using Graph navigation tools Using text-mining with MedScan DAY 1Pathway Building in Pathway Studio
Viewing entities in the List Pane Entity and relation tables Show all references Pathway Reference summary Export protein list Display styles: By type, By effect, By reference count UI options: magnifier fit text to entities simple and full graph view fit to window rotate move zoom by rectangle advanced graph scaling resizing nodes in pathway pane Viewing and editing pathways in Pathway Studio
Quick search String search Search by attribute Build pathway tool Finding entities and relations in Pathway Studio database
Edit Entity property dialog, URN identifier Links to external databases Adding new properties, Declaring new properties in the database Viewing and editing entity/relation properties
Making a figure legend for your publication Viewing group display styles Drag & drop entity icon into pathway pane Palette pane
Drag & drop images into pathway pane Importing your own images Image properties Images pane
131 metabolic pathways 20,972 connected proteins KEGG pathways layoutnode cloning in pathway graph
Adding objects: Drag & drop from the palette Drag & drop from the list pane Adding relations: Connect selected entities button Enter a fact box Drag & drop from the list pane Several methods for adding objects and relations to Pathway pane
Building pathways by manual curation in Pathway Studio In GeneMapp In Pathway Studio
Complex Nodes Adding components to Complex Nodes Building pathways by manual curation in Pathway Studio In Pathway Studio In GeneMapp
How many chemical reactions in the ResNet database? What is the default image for Transcription factor in PS? How many images for cell membrane can be in PS? What is the quickest search in PS? What is the quickest way to add relation to your pathway diagram? Questioner about the previous slides
DAY 1 Automatic Pathway Building using Graph navigation Build pathway tool
Basic principal: Regulatory interactions are mediated by physical interaction network Regulomes Biological processes pathways Disease pathways Mining regulatory relations in database
Build pathway options Filtering by direction Number of steps Build pathway filter Build Pathway dialog The main application of the Build pathway tool is to quickly find connections between entities of interest therefore its button is available from all panes:
Using entity filters to answer different biological questions Using relation filter to analyze different types of high-throughput data Filtering by properties Build pathway filters
Display filtering Selecting results based on local connectivity IsNew column Build pathway Edit Results