200 likes | 338 Views
Distributed Data Management for Biomedical Research. UC Cloud Summit 2011. Dingying Wei and David B. Keator, UCI. FBIRN Consortium. = PostgreSQL HID database. = Firewall. = GridFTP server. UMN. BWH. MGH. UI. Yale. UCSF. VA. UCLA. Duke. UCI. VA. UCSD. MIND/.
E N D
Distributed Data Management for Biomedical Research UC Cloud Summit 2011 Dingying Wei and David B. Keator, UCI
FBIRN Consortium = PostgreSQL HID database = Firewall = GridFTP server UMN BWH MGH UI Yale UCSF VA UCLA Duke UCI VA UCSD MIND/
Data Management Requirements • Each site controls its own data • Collected, processed and meta data • Access control • Site=RW, Project Group= R, User=Site • Replication
Data Flow Result Images and XML wrapper in Data Grid Image / Behavior Data RLS GridFTP Processing Pipeline Analysis Results Genetic Data Database Multi-Site Query Clinical Data
Data Storage and Replication • Data files are stored in BIRN hierarchy in gridFTP servers, metadata and clinical data are stored in Postgres databases • BIRN Portal manages Users/groups • MyProxy is used for single sign-on authentication • Data access checks at both file system and application level • Replica Location Service is for replication
Data Input from Portable Device • Data are submitted in XML files • Web service loads data into the database
Data Input from Excel File • Meta Data Worksheet • Data Codes Worksheet • Subject Data Worksheet
Data Input from Command Line • Anonymizing Data Files • Validating Data • Extracting Meta Data • Performing Quality Assurance • Uploading Data
Data Export • Shopping cart in the web application • Add scans and assessments from multiple sites for download (via job scheduler) • CSV values file for assessment data • Excel spreadsheet
XCEDE XML for Data Exchange Data provenance
Project Status Tracking Data from Database Data from Grid
Open-Source Software • XCEDE XML schema (www.xcede.org) • XML schema for describing/documenting research and clinical studies • Database (www.nitrc.org/projects/hid) • Query interface, workflow pipeline documentation, image download • Clinical Assessment Layout Manager (www.nitrc.org/projects/hid) • Graphical web enabled form builder for data entry
Imaging Processing Example • FSL package for the comprehensive management of large-scale multi-site fMRI projects, including data storage, retrieval, calibration, analysis, multi-modal integration, and quality control.