410 likes | 432 Views
Center for Subsurface Sensing & Imaging Systems. Overview of Research Thrust R3. R3 Fundamental Research Topics R3A Parallel Processing Middleware/Parallelization Tools FPGA Acceleration R3B Solutionware Development Subsurface Toolboxes Image and Sensor Data Databases.
E N D
Center forSubsurface Sensing & Imaging Systems Overview of Research Thrust R3 • R3 Fundamental Research Topics • R3A Parallel Processing • Middleware/Parallelization Tools • FPGA Acceleration • R3B Solutionware Development • Subsurface Toolboxes • Image and Sensor Data Databases David Kaeli - NU Miriam Leeser - NU Wilson Rivera - UPRM
Overview of the Strategic Research Plan Bio-Med Enviro-Civil L3 S2 S3 S4 S1 S5 ValidatingTestBEDs L2 R2 FundamentalScience L1 R3 Image and Data Information Management R1
CenSSIS Barriers Addressed by R3 Projects Lack of Computationally Efficient, Realistic Models Barrier 4 Lack of Rapid Processing and Management of Large Image Databases Barrier 6 Lack of Validated, Integrated Processing and Computational Tools Barrier 7
Center forSubsurface Sensing & Imaging Systems Middleware and Parallelization Tools David Kaeli – NU Wilson Rivera – UPRM Carmen Carvajal - UPRM Magda El-Shenawee - U Arkansas Geoff Krapf – NU (undergrad) Waleed Meleis – NU Craig Shaffer – NU (undergrad) Karen Tompko – U Cincinnati Yijian Wang - NU Juemin Zhang - NU
CenSSIS Middleware Tools • Parallelization of MATLAB, C/C++ and Fortran codes using Message Passing Interface (MPI) – a software pathway to exploiting GRID-level resources • Presently utilizing local Beowulf clusters, NCSA resources (Mariner Center at BU) and Internet-2 • Profile-guided program instrumentation/optimization • Utilizing MPI-2 to address barriers in I/O performance • Building on existing Grid Middleware such as Globus Toolkit, MPICH-G2 and GridPort MATLAB MPI C/C++ MPICH-G2 Parallelization Fortran UPC
Air Mine Soil Impact on CenSSIS Applications • Reduced the runtime of a single-body Steepest Descent Fast Multipole Method (SDFMM) application by 74% on a 32-node Beowulf cluster • Hot-path parallelization • Data restructuring • Reduced the runtime of a Monte Carlo scattered light simulation by 98% on a 16-node Silicon Graphics Origin 2000 • Matlab-to-C compliation • Hot-path parallelization • Obtained superlinear speedup of Ellipsoid Algorithm run on a 16-node IBM Super-Parallel (SP2) system • Matlab-to-C compliation • Hot-path parallelization
Techniques for Parallelizing MATLAB • Manage completely independent MATLAB processes distributed over different processors • Message passing within MATLAB (e.g., MultiMATLAB) • MATLAB calls to parallel libraries (multi-threaded LAPACK, PLAPACK) • Backend compilers can convert MATLAB to C, and automatically inserting MPI calls (e.g., RTExpress) Multiple MATLAB sessions A Single MATLAB session Our Approach Matlab Code Parallel Code C Code Matlab C compiler Use MPI
Our Approach for Parallelizing MATLAB Main 0/2502 Convert MATLAB to C using the MATLAB mcc compiler 1 Function Self /Children Runtime 18/0 1 3/35 1 1 Number of Calls 16/0 Tfqmr 1/1604 Convert array structs (generated by mcc) to pointer-based structs where needed 273 0/0 273 273 Mult1 1/1602 0/0 Profile the C program to capture both data flow and control flow 273 Mult 4/1599 273 Multdifl 706/0 273 273 Multfaflone4m 523/6 Parallelize the “hot” regions of the the application using MPI Multfaflone4 359/5 54K 0.6M 2.4M 0/0 0/0 0/0
Eliminating I/O Barriers in ParallelSubsurface Applications • Many SSI applications tend to be file bound or memory bound (or both) • While we can use MPI to parallelize processing and use MPI collective-I/O to accelerate I/O, we still are limited to accessing a file on a single disk • Our present work looks at parallelizing I/O by partitioning files associated with MPI processes • We attempt to utilize, slower and commodity (IDE) local secondary storage
Eliminating I/O Barriers in ParallelSSI Applications • Parallelize computation using MPI • Profile chunk access frequencies and temporal access patterns on a per process basis • Use profile to guide partitioning to reduce overall execution time by 27%-82% • Presently targeting both file-bound and memory-bound applications
Some Recent Publications “Profile-based Characterization and Tuning Subsurface Sensing Applications”, M. Ashouei, D. Jiang, W. Meleis, D. Kaeli, M. El-Shenawee, E. Mizan, M. and C. Rappaport, Special issue of the SCS Journal, November 2002. “Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM) on a Beowulf Cluster for Subsurface Sensing Application”, D. Jiang, W. Meleis, M. El-Shenawee, E. Mizan, M. Ashouei, and C. Rappaport, IEEE Microwave and Wireless Components Letters, January 2002. “Electromagnetics Computations Using MPI Parallel Implementation of the Steepest Descent Fast Multipole Method (SDFMM)”, M. El-Shenawee, C. Rappaport, D. Jiang, W. Meleis, and D. Kaeli, Applied Computational Electromagnetics Society Journal, August 2002. “An efficient parallel algorithm for solving unsteady nonlinear equations”, W. Rivera, J. Zhu, and D. Huddleston, Proc. International Conference on Parallel Processing, IEEE Computer Society, 2002. “Mapping and characterization of applications in heterogeneous distributed systems,” J. Yeckle and W. Rivera , To appear in Proceed. of the 7th World Multiconference on Systemics, Cybernetics and Informatics (SCI2003). “Profile-Guided I/O Partitioning”, Y. Wang and D. Kaeli, Submitted to ICS’03.
Grid Computing • Solving Subsurfacing Barriers using Grid Computing • Deployment of distributed CenSSIS applications • Development of adaptive middleware • Profile-guided parallelization/optimization • Multi-language support • Profile-guided I/O partitioning • Interaction with distributed image database resources • CenSSIS/HP Industrial Relations • Strong links with Latin American Universities interested in Grid Computing • Student and/or faculty interchange program with CenSSIS schools • Leadership in leading IEEE/ACM GRID Computing Workshops
FEATURE FDFD Difference Eq SDFMM Integral Eq SAMM Modal Expansion Any shape target yes yes smooth boundary Points per wavelength 20-50 8 N. A. Operation A\b A = (106)2; sparse A\b A = 10002; dense SVD A = 100 X 20, special functions Computational speed 250 hours 150 minutes 10 minutes Computational storage High medium medium Coding complexity low high medium GRID Resources are needed for key CenSSIS modeling applications computable on parallel systems targeted for GRID systems
Grid Computing: Experimental Grid @ UPRM Storage Sensors Grid Community Model Campus Backbone Application Layer Χ Middleware Layer Internet 2 Common Infrastruc. Layer Resource Layer IA64 Cluster
Grid Computing: Pattern Categorization Hyperspectral Images • Computational methods for ensembles of nonparametric supervised classifiers • Feedback algorithm Parallelization (Matlab to MPI/C++) • Intrusion Detection & Countermeasure design problems
Grid Computing: LATAM Task Force • Create a LATAM Task Force on Grid Computing. • Universidad de Chile, Chile Ricardo Baeza, PhD in Computer Science, University of Waterloo • Universidad de los Andes, Venezuela Herbert Hoeger, PhD in Computer Science, University of Iowa • Universidad de Sau Paulo, Brasil Marcio Lobo, PhD in Computer Science, TUD, Germany • Instituto Tecnologico de Monterrey, Mexico Cesar Vargas, PhD in Electrical Engineering, Louisiana State University • Universidad del Valle, Colombia Angel Garcia, PhD in Telecommunications, UPV-Spain. • Hold a Grid Workshop for these researchers/educators at UPRM, and invite both CenSSIS and HP people to serve as reviewers and panelists (slated for Nov. 2003). • Provide tutorials and short courses on Grid-level computing. • We will utilize CenSSIS problems as the motivating examples that will be parallelized. Implementations will be prototyped at UPRM, NU and BU.
Center forSubsurface Sensing & Imaging Systems Field Programmable Gate Arrays For Subsurface Imaging Miriam Leeser – NU Wang Chen - NU Srdjan Coric - NU Shawn Miller - NU Seth Molloy – NU (undergrad) Josh Noseworthy – NU (undergrad) Haiqian Yu - NU
Field Programmable Gate Arrays for Subsurface Imaging • Backprojection for Computed Tomography image reconstruction • Sponsored by Mercury Computer • Accelerating Finite Difference Time Domain (FDTD) in hardware • Collaboration with Carey Rappaport, NU • Retinal Vascular Tracing in real time • Collaboration with Badri Roysam and Chuck Stewart, RPI • Diverse problems, similar solutions: FPGAs are particularly well suited for accelerating image processing algorithms
Backprojection • Backprojection algorithm used in medical imaging • Traditionally performed by custom hardware • Application specific integrated circuits and/or custom board designs • New systems require greater flexibility • Algorithms under development for 3D reconstruction • Application specific integrated circuits viewed as costly both in time and NRE • FPGA implementation offers significant advantages • Algorithm flexibility and re-use • Fixed point and quantization effects matter • Difference between fixed and floating point must be small
Projection Parallelism for Performance Parallelism implemented in FireBird (Max 16-way parallel) Data dependency for backprojection processing Projections Image columns Image columns Projections Image columns Projections Image rows Image rows 1024 projections x 1024 samples/projection Each used to reconstruct a 512 x 512 image
Backprojection Speedup Due to Parallelism - Expandable to n-way parallel
Quality of Results are High Software reconstruction (Floating Point) Hardware reconstruction (Fixed Point) Relative Error Sinogram quantization: 9 bits Interpolation factor: 3 bits Relative Error: 0.001295%
FPGA Hardware Provides 100x Speedup Over Software on 1GHz Pentium A: Software - Floating point - 450 MHz Pentium : ~ 240 s B: Software - Floating point - 1 GHz Dual Pentium : ~ 94 s C: Software - Fixed point - 450 MHz Pentium : ~ 50 s D: Software - Fixed point - 1 GHz Dual Pentium : ~ 28 s E: Hardware (Wildstar, simple) - 50 MHz : ~ 5.4 s F: Hardware (Wildstar, 4-way) - 50 MHz : ~ 1.3 s G: Hardware (Firebird, 8-way) - 65 MHz : ~ 0.5s H: Hardware (Firebird, 16-way) - 65 MHz : ~ 0.25s Parameters: 1024 projections 1024 samples per projection 512 x 512 pixels image 9-bit sinogram data 3-bit interpolation factor
FDTD Equations Discretize Maxwell’s Equations – GPR Modeling • Update each space cell's electric and magnetic field by using previous values of this cell and its neighbors cells around it • Extremely computationally expensive • Benefits from hardware acceleration
Z Simplify Receiver Antenna Mine X Y 3-D Buried Object Detection Forward Model
Detailed Architecture of 2-D FDTD Implementation (BlockRam interface and Pipeline updates for one time step)
Retinal Vascular Tracing: Register 2-D Image to 3-D in Real Time • Feature extraction • Registration: image pairs • Registration: montages • Registration: real-time / on-line • Software is too slow • Use FPGAs to accelerate to video frame rate • Image guided surgery
Retinal Vascular Tracing: Register 2-D Image to 3-D in Real Time Direction of blood vessel PCI BUS “SmartCamera”
Developing Embedded Solutionware * All Three Projects Use Same Reconfigurable Hardware, Same Design Flow* Result is Considerable Processing Speedup, Moving Processing Closer to Sensors Firebird PCI board from Annapolis Microsystems
Center forSubsurface Sensing & Imaging Systems • Solutionware Development • Subsurface Toolboxes • Image and Sensor Data Databases David Kaeli – NU Chuck Stewart – RPI Emmanuel Arzuaga – UPRM Jennifer Black – NU Kyle Guilbert – NU (undergrad) Matthew Kowalski – NU (undergrad) Chakib Ouarraoui – NU Amitha Perera - RPI Becky Norum – NU Derek Uluski – NU (undergrad)
CenSSIS Solutionware – UPRM/NU/RPI Toolbox Development • Support the development of CenSSIS Solutionware that demonstrates our “Diverse Problems – Similar Solutions” model • Delivered a software-engineered Multi-View Tomography Toolbox, developed in OOMATLAB • Developing three new CenSSIS Toolboxes • Registration – RPI/WHOI • Hyperspectral Imaging – UPRM • 3-D Modeling - NEU • Establish software development and testing standards for CenSSIS Image and Sensor Data Database • Develop an web-accessible image database for CenSSIS that enables efficient searching and querying of images, metadata and image content • Develop image feature tagging capabilities Matlab 6
Wide Band Probe Focused or Gated Detector Narrow Band Detectors Focused or Pulsed Probe Sources Detectors Object MVT Current/Future Toolbox Development • Development of multi-language toolboxes – C, Fortran, C++, Java, MATLAB and OO-MATLAB • Delivered the MVT Toolbox – open source • Presently working on three additional toolbox efforts • Developing a parallelized version of the MVT Toolbox • Adopted Software Engineering Institute Capability Maturing Model (CMM) Level 3 standards • Software library and bug tracking being developed (CVS and Bugzilla) • Software Engineers on staff at NU, UPRM and RPI Matlab 6 MSD LPM Modeling
CenSSIS Image Database System • Deliver an web-accessible database for CenSSIS that enables efficient searching and querying of images, sensor data, metadata and image content • More that 200 metadata-rich images/datasets presently available online (> 1000 by Year 5) • Database Characteristics: • Relational complex queries (Oracle8i) • Data security, reliability and layered user privileges • Efficient search and query of image content and metadata • Content-based image tagging using XML • Indexing algorithms (2D, 3D and 4D) • Explore object relational technology to handle collections mouse embryo 3 4 2 1
</MediaCoding> <MediaInstance> <Identifier IdOrganization='Clinomics' IdName= 'BreastCancerCell'>BreastCancerCell// image0001 </Identifier> <Locator> <MediaURL>file://D:/Breast/cells/imag0001.jpg</MediaURL> </Locator> </MediaInstance> </MediaProfile> <StructuredAnnotation> <Who>Patient239</Who> <whatObject>Human primary breast tumor cells</whatObject> <WhatAction> growing in a NASA Bioreactor </WhatAction> <where> St. Mary’s Hospital </where> <When> 09/25/2002 </When> <why> Investigate tumor cells behaviour on microcarrier beads </why> <TextAnnotation xml:lang='en-us'> Higher magnification of view illustrating breast cancer cells with intercellular boundaries on bead surface </TextAnnotation> </StructuredAnnotation> </StillRegion> </Image> <Image> <!-- General Cell Infomation --> <CellInformation> <ID> 9 </ID> <ClinomicsID> 931175495 </ClinomicsID> <DOB> 2/7/30 </DOB> <SEX> F </SEX> <COLL_DATE> 11/2/1993 </COLL_DATE> <Primary_site> Breast </Primary_site> <INITIAL> II </INITIAL> <GRADE> POORLY DIFFERENTIATED </GRADE> <HISTOLOGY> UNKNOWN </HISTOLOGY> <PRIM_SITE2> NONE </PRIM_SITE2> <PRIM_DATE> 4/1/1992 </PRIM_DATE> <MET1_SITE> NONE </MET1_SITE> <MET1_DATE> NONE </MET1_DATE> <TUBE_TYPE> p </TUBE_TYPE> </CellInformation> <StillRegion id="IMG0001"> <MediaProfile> <MediaFormat> <FileFormat>jpeg</FileFormat> <System>PAL</System> <Medium>CD</Medium> <Color>color</Color> <FileSize>332.228</FileSize> </MediaFormat> . . . . . . . . General Image Info Image Source Info Image File Location Info Image Donor Info Image Feature Info Image Format Info
Impact to Date and Future Plans • Major Impact Items • Significant acceleration of many critical SSI applications using embedded(FPGA) and parallelization • Delivery of the Multi-View Tomography Toolbox • Delivery of a populated CenSSIS Image Database System • Development of DICOM interoperability • Near term deliverables (Years 3-5) • Development of real-time 3-D vascular tracing smart camera • Migration of compute-bound modeling problems to the GRID • Apply of out-of-core acceleration to critical I/O bound applications • Completion of 3 new Solutionware Toolboxes • Interfacing to visualization toolkits (SCIRUN-Utah, VTK) • 2000 images online by Year 5 • Longer term deliverables (Years 6-8) • Development of a reconfigurable hardware library of SSI applications • Demonstration of the power of the GRID • Delivery of 3 additional CenSSIS Toolboxes • 5000 images online by Year 8
Quilt Chart” Organization & Integration of Year Three CenSSIS Research Program Multi-Institution Collaboration Important Outcomes Advances in Solving Real World Problems S1: Cellular structure studied with 3D Fusion Microscopy S2: 4D Image Guided Radiotherapy S3: Multi-modal, non-invasive screening for incipient breast cancers S4: Remote and in-situ monitoring of submerged coral reefs encompassing shallow and deep water habitats S5: Multi-sensor quantitative assessment of underground contaminants and civil infrastructure CenSSIS Research Areas S2 S4 S5 S1 S3 CenSSIS Research Areas R1ANonlinear and Dual Wave Probes R1BEffective Forward Models R2AMVT Methods R2BLPM Methods R2CMSD Methods R2DImage Understanding & Sensor Fusion Methods R3AParallel Hardware Implementation for Fast Subsurface Detection R3BSolutionware Tools Initial TestBED Facilities I-PLUS Development (Real Problems) Relative Contribution to Outcomes Engineered System Level
Impact on System Level Projects • FPGA - real-time registration • S1 – 3D Fusion Microscope • Parallel/GRID Processing and Toolboxes • S1(3DFM) - Impacting the design of the 3DFM inversion algorithms by accelerating FDTD on a Mercury cluster • S3 (breast cancer) and S4 (coral reef) – Parallelization of the MVT and Hyperspectral toolboxes
Impact on System Level Projects • Sensor and Image Database • S1 (3DFM) - Testcase for advanced submission and tagging capabilities • S2 (Radiation oncology) – Facilitates sharing and indexing 4-D datasets, interfacing DICOM-based systems <xml version=“1.0” encoding=“UTF-8”> <embryo> <description> Embryo developmental stages</description> <feature label=“1” xPos1=“29” yPos1=“33” xPos2=“48” yPos2=“50”> 1 cell embryo </feature> <feature label=“2” xPos1=“50” yPos1=“28” xPos8=“70” yPos2=“40”> 2 cell embryo </feature> <feature label=“3” xPos1= “5” yPos1= “5” xPos2=“25 yPos2=“20”> 4 cell embryo </feature> </embryo>
MVT CenSSIS R3 Research Thrust Summary • Providing both SSI-related computing research expertise and supporting CenSSIS infrastructure needs • Addressing key research barriers in computational efficiency, embedded computing and image/sensor data management • Exploiting Grid resources to enable new discovery in SSI applications • Producing a image/data repository and software-engineered SSI Toolsets • Providing educational and research opportunities to undergraduates and Latin American faculty/students • Developing enabling tools targeting system-level projects • Real-time registration • Accelerated modeling of new inversion algorithms • Indexing and cataloging DICOM and multi-dimensional images