180 likes | 422 Views
Creating a … Community Database Organism-Specific Database Model-Organism Database. Why Create a PGDB?. Perform pathway analyses as part of a genome project Analyze omics data Create a central information resource for the organism Create an FBA model Perform comparative analyses.
E N D
Creating a …Community DatabaseOrganism-Specific DatabaseModel-Organism Database
Why Create a PGDB? • Perform pathway analyses as part of a genome project • Analyze omics data • Create a central information resource for the organism • Create an FBA model • Perform comparative analyses
Model Organism Databases • DBs that describe the genome and other information about an organism • Curated by experts for that organism • No one group can curate all the world’s genomes • Distribute workload across a community of experts to create a community resource • Every sequenced organism with an active experimental community requires a MOD • Integrate genome data with information about the biochemical and genetic network of the organism • Integrate literature-based information with computational predictions
Rationale for MODs • Each “complete” genome is incomplete in several respects: • 40%-60% of genes have no assigned function • Roughly 7% of those assigned functions are incorrect • Many assigned functions are non-specific • MODs are platforms for global analyses of an organism • Interpret omics data in a pathway context • In silico prediction of essential genes • Characterize systems properties of metabolic and genetic networks
What is Curation? • Ongoing updating and refinement of a PGDB • Correct false-positive and false-negative predictions • Incorporate information from experimental literature • Update genome sequence • Update gene functions, gene positions, gene names • Author comments and citations • Add new pathways, modify existing pathways • Enter information about regulatory networks
Issues in Creating Public MODs • Obtaining funding • Scoping the project • Identify user community • Obtain buy-in and help from scientific community • IT: Set up database server, Web server • Hire and train curators
Questions • Do you intend to make your PGDB public and to update it on an ongoing basis? • To create a Model Organism Database?
Obtaining Pathway Tools • Free to non-commercial organizations • To obtain license agreement go to BioCyc.org and click on Software/Database Download • Follow Installation Guide • ptools-local directory • Locate in common directory • PGDBs created by all users who use this ptools installation • PGDBs downloaded via the registry • ptools-init.dat for this ptools installation
New Pathway Tools Releases • Major releases = External software releases • Twice per year • Announced on ptools-users mailing list • Minor releases twice per year affect only our BioCyc.org Web site and flatfile distributions • We support one prior release only • Releases announced on ptools-users@ai.sri.com • Read release notes at • http://brg.ai.sri.com/ptools/release-notes.html • Install process: • Upgrade schema of your DB (software assisted)
PGDB Storage:File or Relational Database • File storage: • Advantages: • No RDBMS installation and configuration • Disadvantages: • Must be loaded and saved in its entirety • No transaction history • No concurrent access for multiple users • Oracle/MySQL storage: • Advantages: • Faster read access, faster saves • Concurrent update access for multiple users • Stores history of all PGDB updates • Disadvantages: • RDBMS must be installed and configured
Multiuser Access to PGDBs • PGDB stored within one Oracle or MySQL server • Each curator installs PTools on their workstation • Different curators can use different software platforms • Workstations query RDBMS server via internet • Local disk cache speeds access • For each frame access, PTools queries • In-memory cache, disk cache, RDBMS server • After curator saves changes, all changes made by other users are loaded into curator’s session
How to Release a PGDB? • Decide on release frequency and schedule • Don’t wait until it’s perfect to release it! • Freeze curation for 1 week • Quality assurrance • Run consistency checker • Tools -> Consistency Checker • Also updates organism-summary statistics • Update publications, authors in organism frame • Update via Organism editor • Create new version of PGDB • ptools-local/pgdbs/yeastcyc/1.0/kb/yeastbase.ocelot • Edit against the new version, release the old version • Author release notes • Register PGDB in SRI PGDB registry • Will allow SRI to include it in BioCyc
Pathway Tools Data Import/Export • File->Export • File->Import • Export/import to/from tab-delimited files • Export to Genbank, SBML, BioPAX • Export to attribute-value files • Attribute-value files can be imported into BioWarehouse • Relational database system for bioinformatics database integration
Napster Comes to Bioinformatics • Public sharing of Pathway/Genome Databases • PGDB registry maintained by SRI at URL http://biocyc.org/registry.html • Registry operations • List contents of registry • Download PGDBs listed in the registry • Register PGDBs you have created
Registry Details • Why register your PGDB? • Declare existence of your PGDB in a central location • Facilitate its download by other scientists • Facilitate its inclusion in BioCyc.org • Why download a PGDB? • Desktop Navigator provides more functionality than Web • Comparative operations • Programmatic querying and processing of PGDB • Registration process • Registered PGDBs have open availability by default • Authors can provide their own license agreements • Registered PGDBs reside in authors’ FTP site or HTTP server
Desktop versus Web Mode • Pathway Tools runs in two different modes: • Desktop mode • Web mode (e.g., BioCyc.org) • Desktop vs Web functionality in Pathway Tools http://biocyc.org/desktop-vs-web-mode.shtml • You can run both desktop and web modes at your site • Your PTools web server need not be open to the public