340 likes | 532 Views
Potential use of Cloud computing for streamlining the processing of MT data Prof J Craig Mudge FTSE. Collaborative Cloud Computing Lab (C3L). New eScience Lab enabled by cloud computing. Seed funding from -- minerals and geothermal research at www.pir.sa.gov.au
E N D
Potential use of Cloud computingfor streamlining the processing of MT dataProf J Craig Mudge FTSE Collaborative Cloud Computing Lab (C3L)
New eScience Lab enabled by cloud computing Seed funding from -- minerals and geothermal research at www.pir.sa.gov.au -- Microsoft Research USA Jim Gray Seed Grant Acknowledgements: David Giles, Richard Lane, Tim Baker, Tristan Wurst JCM 30 Sept 2010
Magnetotelluric (MT) imaging • Using the magnetic and electric fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest. • It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution • Applications • mineral exploration, • water management in mining, • geothermal exploration, • carbon storage, • aquifer research and management • earthquake and volcano studies. (Heinson and Mudge, 2010) CO2 in depleted gas field craig.mudge@adelaide.edu.au 27 sep 2010
Ahead for research in minerals and energy • Data deluge 25 Terabytes per day 700MB of data per second, 60TB/day, 20PB/year Petabytes per day 2. Computation, e.g., rapid inversion 3. Data and experiments: curation, provenance, sharing, reuse Gene sequencers Large Hadron Collider Square Kilometre Array
Approx 100,000 PCs in Google data centres Google Goose-Creek Google Dalles Oregon From www.cloudinnovation.com.au
Essence of cloud • software as a service – applications are delivered over the Internet with a common-or-garden browser • significant cost savings, factors of 5x – 7x • presented as a utility with a matching business model, namely pay-per-use 4. a new data-parallel programming framework
Cost savings in warehouse-sized data centres • resources in massive warehouse-sized data centres are pooled at scale, • built from low-cost commodity chips and disks (run time environment of MapReduce, Dryad takes care of fault tolerance, scheduling, and load balancing) • share the overhead of cooling, refrigeration, physical security, and backup power,
Execution of MapReduce The Map step is shown as in the following slides M (Dean and Ghemawat, 2004)
Decomposition • Task decomposition • How can a problem be decomposed into tasks that can execute concurrently? • Data decomposition • How can a problem's data be decomposed into units that can be operated on relatively independently? then dependencies among the tasks • Group tasks, Order tasks, and Data Sharing
Parallel execution of MT data- one per station M M M M M R R R Station 1 Sort by key Station n
Parallel execution of gridded exploration data by using sub grids when the original is too big to do as one grid M M M M M R R R Form sub grids Re-combine Concrete example: Map step is an existing MatLab program running on Amazon EC2
Water: Data collection, management, and analysis in the cloud Data collection, aggregation - high volumes of complex heterogeneous data Data integration/ Data use data fusion Metadata and databases of interest gateway Organisations (water, government, regulators, market operators, and researchers) will mine this data. Data clean Data analysis Data repurpose Visualisaton Existing data bases Wireless ad-hoc networks - mesh networked motes with sensors satellite Weather Aquifer River Irrigation Remote sense (satellite) Historical photos etc River data: from sensors (both mobile , moored) 40 mm Sensors -- 10 years On 2 AA batteries www.pacific-challenge.com
Academy Working GroupCloud computing at peta-scale • Alex Zelinsky, CSIRO Group Executive, 17 May 2010 “The Academy project has been a real catalyst for getting the cloud computing agenda moving forward in Australia.” 2. Summer internships – cloud computing $1,000 prize won by Jinhui Yao for his security project in an internship hosted by CSIRO 3. Report to be launched October 14 in Canberra
NBN: fiber/wireless net connecting mobile and fixed clients to a cloud computing infrastructure for applications & content Mobile Clients NBN Cloud Computing: Services & Content Fixed Clients & Client Nets Television Content
Computer person’s view of NBN:“Continuous Services i.e. apps & Client Connected Devices” Mobile Clients Connected Devices NBN Cloud Computing: Services & Content Fixed Clients & Client Nets Television Content
Magnetotelluric (MT) imaging • Using the magnetic and electric fields of the earth, MT imaging determines the resistivity structure of a sub-surface area of interest. • It goes deeper (hundred or so Km) than seismic (<2 Km) but does not have the same resolution • Applications • mineral exploration, • water management in mining, • geothermal exploration, • carbon storage, • aquifer research and management • earthquake and volcano studies. (Heinson and Mudge, 2010) CO2 in depleted gas field craig.mudge@adelaide.edu.au 27 sep 2010
Outputs from BIRRP are (a)impedance Z, where E=ZB • (b)coherence data • (c) Apparent resistivity • and phase Station 1 MT Station data from logging in the field BIRRP inspect with GMT plots Station 2 Forward Modelling and Inversion Clean Broadband processing E field conversion to standard units Station n Apparent resistivity Time series Convert to EDI Convert to EDI Convert to EDI
Forward model and inversion Start Compute MT response of new model Compare Model response and MT observed data < Update model N N Exceeded max # of iterations? Required misfit? Y Y <
MT Processing Currently • Time series data from stations Remove outliers • To frequency domain • Apply BIRRP (Chave, Thomson 1989 (robust METHOD) Produces resistivity – by frequency and phase • Inversion to produce subsurface image (Siri 2005) ~ 24 hours ~3 to 4 weeks for 3D Chave and Thompson Bounded influence magnetotelluric response function estimation. Geophys. Jnl. Int. 1989 Siripunvaraporn, Egbert, Lenury, and Uyeshima. Three dimensional magnetotelluric inversion: data-space method. Physics of the Earth and Planetary Interiors 150. 2005
Reflections – September 2010 Value of cloud for PIRSA, our MT processing, and CRC DET • Access to cheap flexible computing • Amazon runs Fortran, Matlab, Python, etc. E.g., T Dhu’s gridded execution • On-demand purchase of a couple of hours of a more powerful computer (generally in memory – 8 Gbytes, for example); pricing is growing in sophistication – spot pricing, micro- instances, etc. • Parallel execution • Easy to get concurrent execution of steps, e.g., 45 stations • Parallel within a step (Google’s MapReduce and Dryad/LINQ) is hard work, but have made a little progress • Our future work on integration in multi-layered data bases has been strongly endorsed Disappointments Honours student gave up on Visualisation of sub-surface layers using Bing/Google Earth eScience workflow was a major contribution (unexpected) • Less human interaction, repeatable, provenance, sharing of workflows internationally • Increasingly important, as volume of data grows No machines Lab: “built first cloud based server, which is the SVN server for C3 Lab in the Amazon EC2 cloud. “ Craig Mudge 29.9.2010
Scientific Workflow Systems • Value proposition: More time on science, less time on code, admin • How: By providing language emphasizing sharing, reuse, reproducibility, rapid prototyping, efficiency • Provenance • Visual programming • Integration with domain-specific tools • Scheduling • Data curation 2010: Honours project in Geophysics – Tristan Wurst – Steps in MT processing Bill Howe, UW
Porosity Joint Inversion Invert for a single parameter, to which both techniques are sensitive (Rachel Maier, 2010)
(Rachel Maier, 2010) MT Inversion SW NE Joint Inversion Seismic constrained Gravity Renmark Trough
Data logging with near real-time feedback Data Compute and geologist’s data integrations Sub-surface
Future areas • Seismic • Inversion and forward modelling in general • Rapid inversion, too • Data integration or data fusion across multiple layers • Data mining
Vision: A geologist steering a drill in real time, using real-time sensing of the sub-surface and updating geological models, while referring to her cloud-based data sets and collaborating with her team back home Data Compute and geologist’s data integrations Geologist in field Seismic, Satellite, MT, Petrophysical Cores, Density etc drilling machine control system steering Collaboration Sub-surface Sensing – a dozen or more sensors Seismic XRF Resistivity etc
www.cloudinnovation.com.aucraig.mudge@adelaide.edu.au0417 679 266
Searching the Deep Earth: sustaining your wealth for the next century from draft report ... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program. ... next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ... High Flyers Think Tank Canberra 19–20 Aug 2010
Searching the Deep Earth: sustaining your wealth for the next century from draft report ... nationally coordinated program to deploy new geophysical tools (magneto telluric, passive seismic) and methods (geochemical) integrated with a comprehensive drilling program. ... next, using petascale computing, Storage, and network resources these data will be integrated into multi-dimensional databases ... High Flyers Think Tank Canberra 19–20 Aug 2010
The Power Wall www.pacific-challenge.com