1 / 44

Challenges and Solutions for Managing the Complexities of a Genomic Core Facility

Learn about the challenges and solutions for managing a genomic core facility, including big data, complicated data, and sensitive data. Discover how GNomEx can help you streamline your workflow and deliver clean, beautiful data to researchers quickly.

schultzm
Download Presentation

Challenges and Solutions for Managing the Complexities of a Genomic Core Facility

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GNomEx Challenges and Solutions for Managing the Complexities of a Genomic Core Facility

  2. Tony Di Sera • Passionate about Software, fascinated by Molecular Biology. • Over 20 years in the software field

  3. University of Utah and The Hunstman Cancer Institute

  4. Our Job is… To deliver clean, beautiful data to the Researcher as quickly as possible…..

  5. GNomEx at a Glance Data Repository • Analysis Project Center • Configurable Annotations • Private to Public Visibility LIMs • Order Tracking • Workflow • Email Notification • Results Delivery Submit Experiment Results Delivery Automated Billing Workflow Analysis Visualization

  6. GNomEx OverviewData Flow Experiments Analysis Visualization

  7. GNomEx Experiments

  8. Experiments cont…

  9. GNomExAnalysis

  10. Visualizing your Data

  11. Visualizing your Data

  12. Three Challenges

  13. Challenge #1 BIG Data Complicated Data Sensitive Data

  14. Big Data If you don’t have slack in the system, your throughput drops to a crawl.

  15. Big Data

  16. If you store your Data In-house…. Hire a talented, fearless, focusedSys Admin xkdc

  17. Transferring BIG Data- FDT by CalTech Connection & Control Management Pool of directly mapped buffers Pool of directly mapped buffers Data Transfer Socket Independent Threads per Device Restore Multiple Files Concurrently

  18. Big Data, Big Processing

  19. Illumina Data Pipeline GNomEx • Barcode Tags • Experiment Info • Run Info Images Experiment Folders

  20. Sequencing Analysis

  21. Automated Analysis Pipeline # run novoalign with default parameters #e david.nix@hci.utah.edu #a A1325 @align -g hg19 -i *.txt.gz #map, recalibrate and call SNP/INDEL w/ GATK @snpindel-g hg19 -i A*.txt.gz #map, recalibrate, call SNP/INDEL, annotate @annot -g hg19 -icontrol_A*.gzcase_B*.gz -vaast -annovar • Simplifies running analyses on cluster • Fully versioned • Customizable

  22. Complicated Data The Data Model The File System

  23. Sensitive Data

  24. Who can Access the Data? Visibility Collaborators Public Institution Lab Members Owner

  25. Three Challenges

  26. Challenge #2The Demand • More Researchers • More Experiments • More Samples per Lane • Push for Faster Results Slower Response Times

  27. It is a shame To ANNOY the user …….in the first 20 seconds

  28. Addressing the Bottlenecks

  29. How many servers are we talking about? Fast Disk Analysis Fast Disk Tomcat FDT High Performance Clusters Data Pipeline Fast Disk Fast Disk Database Server File Server Slow Disk TheRepository

  30. Biggest Bottleneck is…. Getting the features implemented and bugs fixed in GNomEx.

  31. Three Challenges

  32. Different Users, Different Perspectives • 3 Core Facilities • Bioinformatics • Researchers at your Institution • Outside Researchers • Accounting

  33. Three Kinds of Users Submit Experiment Results Delivery Automated Billing Workflow Analysis Visualization Researcher Submit Annotate Preapprove Download Pay Track Download Core Review Split Invoice Record Authorize Register Browse Bioinformatics Analysis Pipeline Upload Annotate Organize Data Pipeline Link Organize Browse

  34. We Don’t Always Speak the Same Language Adapters Molarity Cluster density Optical Error 5’ vs 3’ Spike in NICs Image Copy NFS Case/Control CpG Islands REFS Linux Kernal P-Value Cluster Nodes FDR Interface Eclipse Hibernate JDK Inheritance Ant SQL

  35. But We Share the Same Goal Deliver clean, beautiful data to the Researcher as quickly as possible…..

  36. Agile Development Reducing Risk by shortening the Delivery Window

  37. Agile Manifesto

  38. Iteration Incrementing Iterating

  39. Our Scrum Board

  40. In Summary Housing Big Data requires $ and expertise System performance Is multi-faceted Work towards Shared Understanding. Build a team and process that embraces change.

  41. Plans

  42. Special Thanks

  43. Parting Thoughts Privileged to work in this field Working with bright, interesting, fun, and nice people In an area exploding with new advancements That will ultimately lead to important scientific discoveries http://www.sourceforge.net/projects/gnomex http://hci-scrum.hci.utah.edu/gnomexdoc tony.disera@hci.utah.edu

More Related