40 likes | 340 Views
Scalable graph analytics for metagenomics and metaproteomics. Ananth Kalyanaraman @ HPCBio lab ( ananth@eecs.wsu.edu ) Associate Professor, School of EECS, Washington State University, Pullman, WA.
E N D
Scalable graph analytics for metagenomics and metaproteomics AnanthKalyanaraman@ HPCBio lab (ananth@eecs.wsu.edu) Associate Professor, School of EECS, Washington State University, Pullman, WA Research Areas: Parallel algorithms, Computational biology/bioinformatics, Graph algorithms, String algorithms, Parallel architectures Environmental microbial community analytics • Applications: • bioenergy alternatives • human health • environmental monitoring • soil and forest ecology • ocean microbiology … NGS Funding relevance: DNA, RNA, protein,mass spec/peptide • Data scale: • #studies: >350 • #samples: >2,500 • #genic/ORF reads: >100M+ • … Image courtesy: www.genomesonline.org Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA
Some graph-theoretic problems in environmental microbial community analytics • Problems: • Network construction • Clustering • Community annotation • Network comparison • Heterogeneity • … • Parallelism: • mostly rudimentary/ad hoc in standard workflows • distributed memory • MPI, MapReduce • Intra-node • Multicore, GPUs • Some challenges: • inherits graph-related challenges and choice of architectures • availability of networks/inference • data integration • low sampling, species diversity • qualitative metrics • automated workflows • … • Source data: • Protein/ORF sequence homology • Mass spectral library construction • Interaction networks (gene, protein) Workshop on Future Computing Platforms to Accelerate Next-Gen Sequencing (NGS) Applications, May 19, 2013, held in conjunction with IPDPS’13, Boston, MA
Graphs are pervasive in Computational Biology STRING GRAPHS reads CLIQUE Genome Comparativegenomics gene motifs mRNA PATTERNMATCHING search PROBABILISTIC GRAPH MODELS database TREES, DAGS, TSP, ML Phylogenetictree protein CLASSICAL NETWORKANALYSIS …. Populationgenomics COMPARATIVE NETWORK ANALYSIS Proteinfamilies SIAM CSE'13, Boston, MA