1 / 15

The Cancer Imaging Archive (TCIA): Creating a Large Public Image Collection

The Cancer Imaging Archive (TCIA): Creating a Large Public Image Collection Lawrence R. Tarbox, PhD; Paul Koppel, PhD; Steve Moore, MS; Michael Pringle; Fred Prior, PhD Washington University in St. Louis, School of Medicine Justin Kirby Cancer Imaging Program, National Cancer Institute.

pabla
Download Presentation

The Cancer Imaging Archive (TCIA): Creating a Large Public Image Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cancer Imaging Archive (TCIA): Creating a Large Public Image Collection Lawrence R. Tarbox, PhD; Paul Koppel, PhD; Steve Moore, MS; Michael Pringle; Fred Prior, PhD Washington University in St. Louis, School of Medicine Justin Kirby Cancer Imaging Program, National Cancer Institute

  2. What is TCIA? • Repository for collections of cancer related images and associated data • Both public and restricted access collections • Images, annotations, markup, clinical and research data • Curated, de-identified, indexed, linked • Managed and supported • Focal point for collaborative research See TCIA operations poster for more details

  3. Current Statistics • Over 1000 registered users • Over 3 million objects in 15 public collections (~ 2 TB) • Over 12 million objects in 9 restricted access collections (~ 6 TB) • In the last year, for public collections: • Almost 6000 searches • 4.5 TB data downloaded (13.6 M images)

  4. Original Techincal Goals • High availability • 99.5% uptime • Minimal maintenance windows • No more than 8 hours per month • Scalable, with the following initial loads • 20 users accessing 4 TB of data • 5 sites uploading data

  5. Methods to Achieve Goals • Parallelism • “Divide and Conquer” • Servers dedicated to specific functions • “Many hands make light work” • Spread the load among multiple servers • Redundant Hardware • Geographically Dispersed Sites • Virtualization and Live Migration

  6. NBIA Software Functions • Receive • Curate • Final Prep • Search • Download Default NBIA installation puts all functions in one server

  7. Divide and Conquer Receive Curate Final Prep Search Download Intake Public Receive Curate Final Prep Search Download Receive Curate Final Prep Search Download Receive Curate Final Prep Search Download Receive Curate Final Prep Search Download Identical Instances of NBIA Dedicated to Specific Functions Grouped for now, but could be split later Logical separation of incoming and public data to protect repository integrity

  8. Many Hands, Light Work Load Balancing Switch Intake Public Intake Public Database Replication Intake DatabaseReplication Shared Storage Shared Storage Each group of functions in a cluster, with synchronized DBs and shared image storage within the group

  9. Additional Functions Load Balancing Switch Intake Wiki Web IssueTracker Public Intake Web Public Intake LDAP LDAP Shared Storage Shared Storage Mirrored web servers for static info pages, dashboard, and user registration. Mirrored LDAP servers provide common user ID directory. Development/Staging servers and metadata repositories not shown.

  10. Availability and Scaling • Critical functions in clusters • Databases mirrored within the clusters • Load balancer directs traffic to least loaded node, skipping nodes that are down • Add nodes as loading demands • Additional clusters as needed • Redundant, mirrored, shared storage • Mirrors geographically distributed • Redundant LB switches

  11. Virtualization • Scalable, with minimized startup costs • Grouped functions each in a virtual machine • Server locked for clustered function groups • No need for live migrations, since other nodes in the cluster fill the void when a node is down • Floating for non-clustered functions • Live migrate for server maintenance • Snapshots in case migration fails • Identical servers ease maintenance

  12. Current Hardware • Three 12 core servers, 2.66 GHz, mirrored 600 GB 15K RPM SAS drives, 48 GB memory (two additional 24 GB servers joining the ‘mini-private-cloud’) • Three head clustered storage system with multiple RAID 6 FC-connected storage arrays with near-line SAS disks, maximum capacity 24PB • Redundant L7 (app level) load balancing switches with SSL offload

  13. Other Enhancements • TCIA branding of the NBIA software • Self-service account registration • DICOM “TagSniffer” to assist in curation • Intake improvements, newer version of Clinical Trial Processor (CTP) • Statistics dashboard and analytics • Download Manager improvements (automatic retries, bug fixes) • Metadata Query Tool (in development)

  14. Summary • Parallelism provides reliability and scalability • Virtualization makes parallelism economical and simplifies maintenance • Dividing servers between sites further improves reliability • All can be done with free/open source software, if desired

  15. Questions?

More Related