Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief

www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010

Project Matsu Goals • Provide persistent data resources and elastic computing to assist in disasters: • Make imagery available for disaster relief workers • Elastic computing for large scale image processing • Change detection for temporally different and geospatially identical image sets • Provide a resource to test standards and interoperability studies large data clouds

Part 1:Open Cloud Consortium

www.opencloudconsortium.org • 501(3)(c) Not-for-profit corporation • Supports the development of standards, interoperability frameworks, and reference implementations. • Manages testbeds: Open Cloud Testbed and IntercloudTestbed. • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. • Develops benchmarks.

OCC Members • Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo • Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago • Government agencies: NASA • Open Source Projects: Sector Project

Operates Clouds • 500 nodes • 3000 cores • 1.5+ PB • Four data centers • 10 Gbps • Target to refresh 1/3 each year. • Open Cloud Testbed • Open Science Data Cloud • IntercloudTestbed • Project Matsu: Cloud-based Disaster Relief Services

Open Science Data Cloud Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief

Focus of OCC Large Data Cloud Working Group • Developing APIs for this framework. App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services

Tools and Standards • Apache Hadoop/MapReduce • Sector/Sphere large data cloud • Open Geospatial Consortium • Web Map Service (WMS) • OCC tools are open source (matsu-project) • http://code.google.com/p/matsu-project/

Part 2: Technical Approach • Hadoop – Lead Andrew Levine • Hadoop with Python Streams – Lead Collin Bennet • Sector/Sphere – Lead YunhongGu

Implementation 1: Hadoop & MapreduceAndrew Levine

Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper

Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output

Implementation 2: Hadoop & Python StreamsCollin Bennett

Preprocessing Step • All images (in a batch to be processed) are combined into a single file. • Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm). geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer

Map and Shuffle • We can use the identity mapper • All of the work for mapping was done in the pre-process step • Map / Shuffle key is the geolocation • In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’

Implementation 3: Sector/SphereYunhongGu

Sector Distributed File System • Sector aggregate hard disk storage across commodity computers • With single namespace, file system level reliability (using replication), high availability • Sector does not split files • A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network • A directory can be kept together on a single node as well, as an option

Sphere UDF • Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) • Existing applications can be wrapped up in a Sphere UDF • In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs • ./stream -ihaiti -cossim_foo -o results

For More Information info@opencloudconsortium.org www.opencloudconsortium.org

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief

Presentation Transcript

Mass Data Processing Technology on Large Scale Clusters

Spectral Hashing

Chapter 9: Morphological Image Processing (Digital Image Processing – Gonzalez/Woods)

Large-scale Enzyme Production

Computational Algebraic Problems in Variational PDE Image Processing

{image} {image} {image} {image} {image}

Chapter 2: Image processing and computer vision

Chapter 1: Introduction to Computer Vision and Image Processing

Image Processing : Basic Concept

Disaster Facts and Myths Amy H. Kaji, MD, MPH November 16, 2005 Acute Care College Medical Student Seminar

CS448f: Image Processing For Photography and Vision

Large-scale Incremental Processing Using Distributed Transactions and Notifications

Wavelet for Graphs and its Deployment to Image Processing

Processing-Using Image Files

Pollution: 1952 London Smog Disaster

Image Processing

Image processing Image and image sequence compression bases

Digital Image Processing

Digital Image Processing

Digital Image Processing Chapter 10: Image Segmentation 13 July 2005

The Disaster Facts…

Besov Bayes Chomsky Plato