110 likes | 290 Views
Genomic Data Analysis Services Available for PL-Grid Users. Tomasz Waller, Tomasz Gubała , Kazimierz Murzyn. Academic Computer Centre Cyfronet AGH, cyfro.net Klaster LifeScience Kraków , lifescience.pl. Recent Advances in Omics Research, Kraków, October 2014.
E N D
Genomic Data Analysis ServicesAvailable for PL-Grid Users Tomasz Waller, Tomasz Gubała, Kazimierz Murzyn Academic Computer Centre Cyfronet AGH, cyfro.net KlasterLifeScienceKraków, lifescience.pl Recent Advances in Omics Research, Kraków, October 2014
ACC Cyfronet AGH andPL-Grid Infrastructure Academic Computer Centre Cyfronet AGH • Established in 1973 (40 years of experience) • Provides network, computational power and data storage capabilities for Polish science • ~374 TFlops (zeus, 175@top500), 2.5 PB (disks)and 3.5 PB (tapes) • 1.7 PFlops (prometheus) with 10 PB of disks,expected first half of 2015 • Regular and bigmem nodes, vSMP, GPGPU, FPGA,MPI over Infiniband • Details: http://kdm.cyfronet.pl/ PL-Grid Infrastructure for Polish science • Five computing centers with Cyfronet asthe consortium leader • Total: ~588 TFlops and ~5.6 PB (disks) butsoon to grow considerably (see above) • Available free of charge to all Polish scientistsand their foreign collaborators • Details: http://www.plgrid.pl
Using PL-Grid Infrastructure • Register at https://portal.plgrid.pl • User verification process based on Polish OPI number • Assistants and foreigners are confirmed by Polish PIs • Variety of basic and higher level services available after login • Local SSH access, cloud computing, middlewares • Considerable library of installed applications • GATK, MACS, SAMTools, Picard, TopHat, Bowtie, (p)BWA, R/Bioconductor, AutoDock/AutoGrid, BLAST, Clustal, CPMD, Gromacs, NAMD, Matlab, Mathematica … • Free to compile and install own applications using the shell login • Possibility to use own commercial licenses on HPC resources • Specific services dedicated to the Life Science domain
DNA Microarray Integromics Analysis Platform (1/2) https://lifescience.plgrid.pl/ • For people who perform biological investigations using DNA microarrays • Goal: help to analyze gene expression information and correlate it with other clinical data • Analyses available now: normalization, clustering, SAM, T-test, GO-based enrichment, ANNs, PCA, panel filtering • ’Integromics’ analyses in ’beta’ (testing) stage • CCA, PLS (gene expression and lipidomics) • Roleswitch, TargetScore (gene expression and miRNA) • Still in continuous development (Pathways, EBI export etc.) • Supported models: some Affymetrix, AgilentSurePrint (addingsupport for others is possible, in case of demand)
DNA Microarray Integromics Analysis Platform (2/2) • Notable features • Integration with EBI ArrayExpress (import, MIAME) • Sharing experiments with others • Importing own data for further analysis • Supported languages: PL, EN • Manual: https://docs.cyfronet.pl/x/JpaZ • Cooperation • Jagiellonian University Medical Collage, Kraków • Medical University of Silesia, Katowice • Institute of Oncology, Gliwice
Agilent GeneSpring GX • RDP: genespring.plgrid.pl • Used with Windows Remote Desktop • Integrated with the DNA Integromics Platform for uniform microarray files management • 5-year, single-seat license for all registered Polish scientists • Manual: https://docs.cyfronet.pl/x/JIq1
Galaxy NGS Server (2/4) https://galaxy.plgrid.pl/ ”Galaxy is an open, web-based platform for data intensive biomedical research.” • Goal: deploy high-performance, high-throughput NGS data analysis solution on top of HPC resources for PL-Grid users • Needs a lot of adjustments and in-house add-on development • Work started 12.2013, and still at a beta stage… - but accessible to anyone willing to test and to help • Planned integrated tools (list not closed): GATK, SAMtools, Bowtie, TopHat, BWA, bedtools, Cufflinks, Picard, SnpEff/SnpSift, Flexbar, FastQC, MACS • Targeted platforms: Illumina *Seq, Ion Proton, Roche 454
Galaxy NGS Server (3/4) • Notable features • Full integration with Zeus cluster and disk arrays • PBS and MQ system for effective job queuing • Secured environment (open for all PL-Grid users, not ”public”) • All major Galaxy features (history, sharing, viewers) • Well documented workflows designed by NGS experts • Basics (alignment and quality control, trimming, filtering) • DNA-Seq, RNA-Seq, variant calling, SNP calling, methylation, exome analysis with annotations • Manual: https://docs.cyfronet.pl/x/voas • Cooperation • Institute of Pharmacology, Polish Academy of Sciences, Kraków • OMICRON, Jagiellonian University Medical Collage, Kraków • National Research Institute of Animal Production, Kraków-Balice
Galaxy NGS Server (4/4) • Current challenges • Some security issues in the Galaxy code prevent the production deployment • Cluster integration is there, yet rather unstable and prone to fail (quite an intricate contraption, it is) • Broad variety of integrated tools and wrappers does not help • Call to action – who is needed • Users: the bigger the community, the easier to make us visible • Early adopters: tell us what you need, help us test and integrate the tools and workflows you use • Programmers: if you’d like to help us bring a dedicated HPC-powered Galaxy for Polish scientists, any assistance is greatlyappreciated • Contact: t.gubala@cyfronet.pl
Links, Contact, Partners • These resources, services and tools (and much more) are available after registering to PL-Grid https://portal.plgrid.pl/ • PL-Grid User Manual • https://docs.plgrid.pl/podrecznik_uzytkownika (PL) • https://docs.plgrid.pl/display/PLGDoc/User+manual (EN) • Questions, problems, requests about PL-Grid • https://helpdesk.plgrid.plorhelpdesk@plgrid.pl • Contact for LifeScience domain services • plgrid@lifescience.pl