1 / 20

Biopackages

Biopackages.net. Operating System Packages for Bioinformatics Allen Day 2005.05.17. What is a package?. Software, config files, documentation, and/or data encapsulated in a single file Metadata describing: Version, license, package “category” Dependencies What the package provides.

chessa
Download Presentation

Biopackages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biopackages.net Operating System Packages for Bioinformatics Allen Day 2005.05.17

  2. What is a package? • Software, config files, documentation, and/or data encapsulated in a single file • Metadata describing: • Version, license, package “category” • Dependencies • What the package provides

  3. GMOD target audience • Small MODs

  4. Package Dependency Graph • Dependencies • What the package provides chado-Hsa genome-Hsa-annotation-gene postgresql-AffxSeq genome-Hsa-annotation-affymetrix chado perl-bioperl perl-go-perl postgresql-server genome-Hsa-nib ucsc-blat obo-core

  5. Dependencies • Build Dependency • Installation Dependency

  6. What is a Package Manager? • Tools to manage installation, upgrade, uninstallation of packages • Verify package integrity (checksums) • Maintain system integrity • Transactional • Allow rollbacks • Dependency checking • Dependency graph recursion • Allow software customization (patches)

  7. Why bioinformatics packages? • Consistency of installation process • Bioinfo. package installs vary wildly, and commonly lack documentation • Automatic dependency installation • Perl modules especially bad – bioperl has 60+ modules in its dependency tree • Integrity/Auditing of system state • Know an installed package works, which version, how to replicate system setup • Tighter integration with operating system • Daemons, config & log file locations, etc.

  8. What’s available? • RPM packages only right now • Primary focus on Fedora Core 2 • Some RPMs also available for • Fedora Core 3 • RedHat 9 • Cygwin

  9. What’s available? • Three primary foci • Applications • Libraries • Data sets

  10. Applications • Gbrowse • Textpresso • BLAT daemon • NCBI Toolkit (BLAST, etc) • HMMer

  11. What’s available? • Libraries • Bioperl • R & Bioconductor • Squid • EMBOSS

  12. What’s available? • Data sets • Genome & protein sequence • Sequence features • Ontologies • All installed using a common directory structure

  13. What’s available? • UCSC tools (utilities, BLAT system service, CGI scripts) • Bioperl • R / Bioconductor • GMOD apps (Gbrowse, Textpresso, …) • Data packages • Genome sequence (fa, nib, blastdb) • Genome features (Affy probeset alignments, mRNA, etc)

  14. das2-Hsa apollo-Hsa cmap-Hsa genome-Hsa-nib ucsc-BLAT GMOD Components Available gmod-web-Hsa chado-Hsa gbrowse textpresso turnkey chado • ‘Hsa’ can be substituted for your organism • Currently built for ‘Cel’, ‘Hsa’, ‘Sce’

  15. More details… chado-Hsa genome-Hsa-annotation-gene genome-Hsa-annotation-affymetrix postgresql-AffxSeq chado perl-go-perl perl-bioperl postgresql-server genome-Hsa-nib ucsc-blat … … … … …

  16. Gene Expression Components DAS/2 for Genotyping, GeneChip Quant/Norm Pipeline chado-GEC chado-Hsa R Bioconductor

  17. Resources • http://www.biopackages.net • ~1000 RPMs for Fedora Core 2, 3 • Available via yum • See site for a configuration example.

  18. TODO • Support more architectures • Build for Cygwin & OS X. RPM has been ported to both • Automate package build process • Build farm of multiple architectures, controllable via scheduler (GridEngine) • Automate (if possible) inclusion of new software / data releases

  19. TODO • Build community interest and involvement • Keep adding more packages! • Keep existing packages current!

  20. Acknowledgements • Patrick Alger • Jared Fox • Brian O’Connor • Todd Harris • Lincoln Stein • Stanley Nelson

More Related