1 / 20

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief

www.opencloudconsortium.org. Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief. Collin Bennett, Robert Grossman, Yunhong Gu , and Andrew Levine Open Cloud Consortium. June 21, 2010. Project Matsu Goals.

dixon
Download Presentation

Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.opencloudconsortium.org Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium June 21, 2010

  2. Project Matsu Goals • Provide persistent data resources and elastic computing to assist in disasters: • Make imagery available for disaster relief workers • Elastic computing for large scale image processing • Change detection for temporally different and geospatially identical image sets • Provide a resource to test standards and interoperability studies large data clouds

  3. Part 1:Open Cloud Consortium

  4. www.opencloudconsortium.org • 501(3)(c) Not-for-profit corporation • Supports the development of standards, interoperability frameworks, and reference implementations. • Manages testbeds: Open Cloud Testbed and IntercloudTestbed. • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. • Develops benchmarks.

  5. OCC Members • Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo • Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago • Government agencies: NASA • Open Source Projects: Sector Project

  6. Operates Clouds • 500 nodes • 3000 cores • 1.5+ PB • Four data centers • 10 Gbps • Target to refresh 1/3 each year. • Open Cloud Testbed • Open Science Data Cloud • IntercloudTestbed • Project Matsu: Cloud-based Disaster Relief Services

  7. Open Science Data Cloud Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief

  8. Focus of OCC Large Data Cloud Working Group • Developing APIs for this framework. App App App App App Table-based Data Services Relational-like Data Services App App Cloud Compute Services (MapReduce, UDF, & other programming frameworks) App App Cloud Storage Services

  9. Tools and Standards • Apache Hadoop/MapReduce • Sector/Sphere large data cloud • Open Geospatial Consortium • Web Map Service (WMS) • OCC tools are open source (matsu-project) • http://code.google.com/p/matsu-project/

  10. Part 2: Technical Approach • Hadoop – Lead Andrew Levine • Hadoop with Python Streams – Lead Collin Bennet • Sector/Sphere – Lead YunhongGu

  11. Implementation 1: Hadoop & MapreduceAndrew Levine

  12. Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Step 1: Input to Mapper Mapper Output Value: + Timestamp Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp Step 3: Mapper Output Step 2: Processing in Mapper

  13. Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: … … Step 1: Input to Reducer Result is a delta of the two Images Assemble Images based on timestamps and compare Step 2: Process difference in Reducer All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set Step 3: Reducer Output

  14. Implementation 2: Hadoop & Python StreamsCollin Bennett

  15. Preprocessing Step • All images (in a batch to be processed) are combined into a single file. • Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm). geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixels the fields in red are metadata needed to process the image in the reducer

  16. Map and Shuffle • We can use the identity mapper • All of the work for mapping was done in the pre-process step • Map / Shuffle key is the geolocation • In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’

  17. Implementation 3: Sector/SphereYunhongGu

  18. Sector Distributed File System • Sector aggregate hard disk storage across commodity computers • With single namespace, file system level reliability (using replication), high availability • Sector does not split files • A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network • A directory can be kept together on a single node as well, as an option

  19. Sphere UDF • Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) • Existing applications can be wrapped up in a Sphere UDF • In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs • ./stream -ihaiti -cossim_foo -o results

  20. For More Information info@opencloudconsortium.org www.opencloudconsortium.org

More Related