230 likes | 381 Views
Cyberinfrastructure. Geoffrey Fox Indiana University with Linda Hayden Elizabeth City State University April 5 2011 Virtual meeting. Cyberinfrastructure. Supports the Expeditions with light weight field system – hardware and system support
E N D
Cyberinfrastructure Geoffrey Fox Indiana University with Linda HaydenElizabeth City State University April 5 2011 Virtual meeting
Cyberinfrastructure • Supports the Expeditions with light weight field system – hardware and system support • Then perform offline processing at Kansas, Indiana and ECSU • Indiana and ECSU facilities and initial field work funded by NSF PolarGrid MRI which is now (essentially) completed • Initial basic processing to Level 1B • Extension to L3 with image processing and data exploration environment • Data is archived at NSIDC Prasad Gogineni With the on-site processing capabilities provided by PolarGrid, we are able to quickly identify Radio Frequency Interference (RFI) related problems and develop appropriate mitigation techniques. Also, the on-site processing capability allows us to process and post data to our website within 24 hours after a flight is completed. This enables scientific and technical personnel in the continental United States to evaluate the results and provide the field team with near real-time feedback on the quality of the data. The review of results also allows us to re-plan and re-fly critical areas of interest in a timely manner.
IU Field Support Efforts 2010 • OIB Greenland 2010 • RAID-based data back up solution • Second server to handle processing needs • over 50TB collected on-site • copying to Data Capacitor completed at IU in Feb 2011 • OIB Punta Arenas 2010 • 20TB using same back up solution
IU Field Support, Spring 2011 • OIB and Twin Otter flights simultaneously, two engineers in the field • The most equipment IU has sent to the field in any season • processing and data transfer server at each site • two arrays at each field site • Largest set of data capture/backup jobs yet between CReSIS/IU
Field Equipment in detail • OIB Thule: 3 2U, 8 core servers, 3 24TB SATA arrays, 12 cases of 40 1.5TB drives • Illullissat: 2 2U, 8 core servers, 2 24TB SATA arrays, 6 cases of 40 1.5 TB drives • 2010 Chile: 3 2U, 8 core servers, 3 24TB SATA arrays, 6 cases of drives • 2010 Thule-to-Kanger: 1 2U, 8 core server, 1 24TB SATA array, 6 cases of drives in Thule, 5 in Kanger. Drives in Thule-to-Kanger were re-used drives from earlier Antarctic work and 3 cases failed in Thule. • Note 100 drives failed in total so far (its harsh out there)
IU Lower 48 support • 2010 data now on Data Capacitor • Able to route around local issues if necessary, by substituting other local hardware temporarily • Turnaround/management of IU affiliate accounts for CReSIS researchers and students • Some tuning of Crevasse (major PolarGrid system at IU) nodes for better job execution/turnaround complete
Summer 2010 Cyberinfrastructure REU • Joyce Bevins Data Point Visualization and Clustering Analysis Mentors: Jong Youl Choi, Ruan Yang, and Seung-Hee Bae IUB • Jean Bevins Creating a Security Model for SALSA HPC Portal Mentors: Adam Hughes, Saliya Ekanayake IUB • JerNettie Burney and Nadirah Cogbill Evaluation of Cloud Storage for Preservation and Distribution of Polar Data Mentors: Marlon Pierce, Yu (Marie) Ma, Xiaoming Gao, and Jun Wang IUB • Constance Williams Health Data Analysis Mentor: Jong Youl Choi IUB • Robyn Evans and Michael Austin Visualization of Ice Sheet Elevation Data Using Google Earth & Python Plotting Libraries Mentors: Marlon Pierce, Yu (Marie) Ma, Xiaoming Gao, and Jun Wang IUB
Academic Year Student Projects • A Comparision of Job Duration Utilizing High Performance Computing on a Distributed Grid Members: Michael Austin, JerNettie Burney, and Robyn Evans Mentor: Je'aime Powell • Research and Implementation of Data Submission Technologies in Support of CReSIS Polar and Cyberinfrastructure Research Projects at Elizabeth City State University Team Members: Nadirah Cogbill, Matravia Seymore Team Mentor: Jeff Wood Mentors with Xiaoming Gao, Yu "Marie" Ma, Marlon Pierce, Jun Wang at IU • JerNettie Burney, Glenn Koch, Jean Bevins, Cedric Hall A Study on the Viability of Hadoop Usage on the Umfort Cluster for the Processing and Storage of CReSIS Polar Data. Mentor: Je'aime Powell
Other Education Activities • Two ADMI faculty, one graduate student and one undergraduate student participated in the Cloud Computing Conference CloudCom2010 in Indianapolis December 2010 • Fox presented at ADMI Cloud Computing workshop for faculty December 16 2011 Jerome Mitchell (IU PhD, ECSU UG, Kansas Masters) will describe A Cloudy View on Computing Workshop at ECSU June 2011
Supporting Higher Level Data Products • Image Processing • Data Browsing Portal from Cloud’ • Standalone Data Access in the field • Visualization
Hidden Markov Method based Layer Finding P. Felzenszwalb, O. Veksler, Tiered Scene Labeling with Dynamic Programming, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
Current CReSIS Data Organization • The data are organized by season. Seasons are broken into data segments which are contiguous blocks of data where the radar parameters do not change. • Data segments are broken into frames (typically 50 km in length). Associated data for each frame are stored in different file formats CSV (flight path), MAT (depth sounder data), PDFs (image products). • CReSIS data products website lists direct download links for individual files.
PolarGrid Data Browser Goals • Organize the data files by its spatial attributes. • Support multiple protocols for different user groups, such as KML service and direct spatial database access. • Support efficient access methods in different computing and network environments. • Cloud and Field (standalone) versions • Support high level spatial analyses functions powered by spatial database
PolarGrid Data Browser Architecture Cloud Access Field Access WMS Matlab/GIS Single User • Two main components: Cloud distribution service and special service for PolarGrid field crew. • Data syncopation is supported among multiple spatial databases. GIS Cloud Service Field Service SpatiaLite SQLite Database GeoServer Spatial Database Data Portal Spatial Database Virtual Appliance Virtual Storage Service Multiple Users (local network) KML Google Earth
PolarGrid Data Browser:Cloud GIS Distribution Service • Google Earth example: 2009 Antarctica season • Left image: overview of 2009 flight paths • Right image: data access for single frame
Technologies inCloud GIS Distribution Service • Geospatial sever is based on GeoSeverand PostGreSQL(spatial database), and configured inside the Ubuntu virtual machine. • Virtual storage service attaches terabyte storage to the virtual machine. • The Web Map Service (WMS) protocol enables users to access the original data set from Matlab and GIS software. KML distribution is aimed for general users. Data portal are built with Google Map, and can be embedded into any website.
PolarGrid data distribution on Google Earth • Processed on cloud using MapReduce
PolarGrid Field Access Service • Field crew has limited computing resource and internet connection. • Essential data set are downloaded from Cloud GIS distribution service, packed as spatial database virtual appliance with SpatiaLite. The whole system can be carried around on a USB flash drive. • Virtual appliance is built on Ubuntu JeOS (just enough operating system), it has almost identical functions as GIS Cloud service, works on local network with VirtualBox. The virtual appliance runs with 256 M virtual memory. • SpatiaLite database is a light-weight spatial database based on SQLite. It aims at a single user; • the data can be accessed through GIS software, and a native API for Matlabhas also been developed.
PolarGrid Field Access Service • SpatiaLite data access with Quantum GIS interface • Left image: 2009 Antarctica season vector data, originally stored in 828 separate files. • Right image: visual crossover analysis for quality control (work in progress)
URL References • CReSIS data products: https://www.cresis.ku.edu/data • GeoServer: http://geoserver.org/ • PostgreSQL: http://www.postgresql.org/ • VirtualBox: http://www.virtualbox.org/ • SpatiaLite: http://www.gaia-gis.it/spatialite/ • Quantum GIS: http://www.qgis.org/