120 likes | 248 Views
Computations with Big Image Data. Phuong Nguyen Sponsor: NIST. Computations with Big Image Data. Motivation: Live cell image processing application: microscope generates a large number of spatial image tiles with several measurements at each pixel per time slice.
E N D
Computations with Big Image Data Phuong Nguyen Sponsor: NIST
Computations with Big Image Data • Motivation: • Live cell image processing application: microscope generates a large number of spatial image tiles with several measurements at each pixel per time slice. • Analyze these image including computations that calibrate, segment and visualize image channels, as well as extract image features for further analyses • Using desktop • E.g. image segmentation on stitched image using Matlab 954 files*8mins= 127 hours Stitched TIFF: ~0.578 TB per experiment • E.g 161files * 8mins= 21.5 hours 1GB per file • Goals: • Computational scalability of cell image processing • Data distributed partitioning strategies , parallel algorithms • Analysis and evaluation on different algorithm/approaches • Generalize as libraries/benchmarks /tools for image processing
Computations with Big Image Data cont. • Processing these image: • Operate either on thousands of Mega-pixel images (image tiles) or on hundreds of a half or Giga-pixel images (stitched images) • Range from computationally intensive to data intensive • Approaches: • Develop distributed data partitioning strategies and parallel processing algorithms • Implement/Run benchmarks: distributed /parallel framework/platforms • Use HadoopMapReduce framework and compare with using other frameworks or parallel scripts (PBS) using network file system storage
Image segmentation using Java/Hadoop • Segmentation method that consists of four linear workflow steps: • Sobel-based image gradient computation • Connectivity analysis to group 4-connected pixels and threshold by a value to remove small objects • Morphological open (dilation of erosion) using 3x3 convolution kernel to remove small holes and islands, and • Connectivity analysis and threshold by a value to remove small objects again • Connectivity analysis to assign the same label to each contiguous group of 4-connected pixels. Sodel gradient equation
Flat Field Correction • Correct spatial shading of tile image • where IFFC(x,y) the flat-field corrected image intensity, • DI(x,y) is the dark image acquired by closing camera shutter, is the raw uncorrected image intensity • WI(x,y) is the flat field intensity acquired without any object
Characteristic of selected cell image processing computations Summary of computations, and input and output image data files
HadoopMapReduce approach Source: http://developer.yahoo.com/hadoop/tutorial/module4.html • Images files upload to HDFS • Changes of input formats (read image Input format and serialization ) • Splitting of the input (currently No split – mapper process whole stitched image … ). • Only use Mapper, output directly write to HDFS as files Output Files Output Files
HadoopMapReduceapproach cont. • Advantage of using Hadoop • Data at local node -> avoid network file system bottlenecks running at scale • Managing execution of tasks, auto rerun-failed tasks for task failures • Big image loss more work if failures on task • Small images e.g. <128MB –> use HadoopSequenceFilesthat consists of binary key/value pairs (key: image filename, value: image data). Alternative Apache Avro (a data serialization system) • Run on NIST HPC cluster (Raritan cluster) • HPC queue system • Move data in/out • Not possible to share data in HDFS
Image segmentation benchmark using Hadoop results • Single node and single threaded using Java take 10 hours. • Using Matlab on desktop machine take ~21.5 hours • Both I/O and computation intensive. • Image segmentation scale well using Hadoop • Efficiency decrease as increase number of nodes
Flat Field Correction benchmark using Hadoop results • I/O intensive tasks primary writing output data to HDFS file system
HadoopMapReduce approach cont.Future work considering techniques • Future work considering techniques • Achieve pixel level parallelism by breaking each image into smaller images, running algorithms (segmentation/flat field correction, …) and joining the results upon completion (before download files from HDFS to network file system. • This method can also be extended to overlapping blocks (by provide a method that splits the input (image) along boundaries between atomic number of rows/cols in input image and define number of overlapping pixels along each sides) • Comparison between non split/split/split with overlapping pixels • Reduce tasks in MapReduce framework can be useful for some image processing algorithm e.g. feature extraction
Summary • We have developed image processing algorithms and characterized their computations as potential contributions to • scale cell image analysis application and • provide image processing benchmarks using Hadoop • Future work considers • Optimize and tune these image processing computations using Hadoop • Towards generalize as libraries/benchmarks /tools for image processing