160 likes | 257 Views
Update on Damasc. Joe Buck. October 19th, 2010. A year later. Last year: we outlined our vision Next year: Carlos and Alkis covered that Today: Where we’re at. What’s in a name?. Last year I presented on Dice (Data Intensive Computation Environment)
E N D
Update on Damasc • Joe Buck October 19th, 2010
A year later • Last year: we outlined our vision • Next year: Carlos and Alkis covered that • Today: Where we’re at
What’s in a name? • Last year I presented on Dice (Data Intensive Computation Environment) • We’ve change the name to Damasc, which incorporates parts of DICE but is more focused on data management
Goal of Damasc • To allow applications to express their internal data structure to the storage system • Enable more intelligent storage layout which leads to increased functionality in the storage system
Application data-element alignment in parallel FS • Created traces for common access patterns over scientific data • Mapped those traces onto a theoretical parallel file system configuration • Analyzed traces to quantify IO savings from aligning data to application data element boundaries
Application data-element alignment in parallel FS - cont. We want to go from this We want to go from this
MapReduce over scientific data • Goal was to implement NetCDF Operators (NCO) as MapReduce programs • Base NetCDF file decomposed via C++ application. Constituent parts stored in HDFS • Currently being worked on
MapReduce over scientific data - continued We want to go from this To this
MapReduce over scientific data - continued We want to go from this Or better yet
Tracing of scientific application data access • Created a tracing layer for ParaView that logged data access from the application’s perspective • Noah will talk more about tracing
Scientific data in a key-value store • Project to enable NetCDF ingestion into HBase
Declarative queries over NetCDF • Integration of NetCDF format into Zorba query engine • Enabling XML queries over NetCDF • Incremental parsing to avoid loading entire file • Future work: NetCDF methods in XML Query
Conclusion • Last year was about exploring the problem space • Applying lessons learned, moving forward
Questions • Thank you for your time • buck@soe.ucsc.edu • srl.ucsc.edu/projects/damasc