200 likes | 290 Views
www.opencloudconsortium.org. Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief. Collin Bennett, Robert Grossman, Yunhong Gu , and Andrew Levine Open Cloud Consortium. June 21, 2010. Project Matsu Goals.
E N D
www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010
Project Matsu Goals • Provide persistent data resources and elastic computing to assist in disasters: • Make imagery available for disaster relief workers • Elastic computing for large scale image processing • Change detection for temporally different and geospatially identical image sets • Provide a resource to test standards and interoperability studies large data clouds
www.opencloudconsortium.org • 501(3)(c) Not-for-profit corporation • Supports the development of standards, interoperability frameworks, and reference implementations. • Manages testbeds: Open Cloud Testbed and IntercloudTestbed. • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. • Develops benchmarks.
OCC Members • Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo • Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago • Government agencies: NASA • Open Source Projects: Sector Project
Operates Clouds • 500 nodes • 3000 cores • 1.5+ PB • Four data centers • 10 Gbps • Target to refresh 1/3 each year. • Open Cloud Testbed • Open Science Data Cloud • IntercloudTestbed • Project Matsu: Cloud-based Disaster Relief Services
Open Science Data Cloud Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief
Focus of OCC Large Data Cloud Working Group • Developing APIs for this framework. App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services
Tools and Standards • Apache Hadoop/MapReduce • Sector/Sphere large data cloud • Open Geospatial Consortium • Web Map Service (WMS) • OCC tools are open source (matsu-project) • http://code.google.com/p/matsu-project/
Part 2: Technical Approach • Hadoop – Lead Andrew Levine • Hadoop with Python Streams – Lead Collin Bennet • Sector/Sphere – Lead YunhongGu
Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper
Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output
Preprocessing Step • All images (in a batch to be processed) are combined into a single file. • Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm). geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer
Map and Shuffle • We can use the identity mapper • All of the work for mapping was done in the pre-process step • Map / Shuffle key is the geolocation • In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’
Sector Distributed File System • Sector aggregate hard disk storage across commodity computers • With single namespace, file system level reliability (using replication), high availability • Sector does not split files • A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network • A directory can be kept together on a single node as well, as an option
Sphere UDF • Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) • Existing applications can be wrapped up in a Sphere UDF • In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs • ./stream -ihaiti -cossim_foo -o results
For More Information info@opencloudconsortium.org www.opencloudconsortium.org