100 likes | 179 Views
MetaData (Management) and MPI-IO. PIs: Alok Choudhary, Wei-Keng Liao Department of ECE, Northwestern University With Bill Gropp and Rob Ross, ANL. SDM kickoff meeting July 10-11, 2001. Assess Hint – MPI-IO Info.
E N D
MetaData (Management)andMPI-IO PIs: Alok Choudhary, Wei-Keng Liao Department of ECE, Northwestern University With Bill Gropp and Rob Ross, ANL SDM kickoff meeting July 10-11, 2001 choudhar@ece.nwu.edu 1
Assess Hint – MPI-IO Info • MPI Info object provides MPI-IO implementations file access hints for improving I/O performance and/or for minimizing use of system resources • File info is specified on a per file basis • We can classify MPI-IO file info as follows: • Access pattern : access frequency, sub-array access, sub-array size, the number of accessing processors • Caching : turn on/off server buffering, buffering for data sieving (collective buffering), buffer size, block size • File storage : File name and permission when creating, I/O node list, striping factor, striping size, I/O device number from which to start striping choudhar@ece.nwu.edu 2
MetaData Used in MPI-IO • High Level (Application Oriented) • Parallel partition patterns: data sieving • Access frequency: once, multiple times • I/O modes: read only, overwrite, create • Request chunk size: small, medium, large • Request sequence: random, strided, sequential • Low Level (Storage System Oriented) • File striping: striping factor, striping size • File caching, buffer size • Storage pattern, storage device, I/O nodes • Migrate, purge within hierarchical storage system choudhar@ece.nwu.edu 3
Who Accesses and How? choudhar@ece.nwu.edu 4
Where is the Metadata? • Meta data provided by users or database • File info used internally to optimize I/O • Users specify manually • Users provide both high and low level meta data • Manually choose collective / non-collective MPI I/O calls • Provided by database • Apply I/O optimization rules to determine proper MPI-file info • Use of collective / non-collective calls is determined automatically choudhar@ece.nwu.edu 5
Rules to Determine I/O Strategies choudhar@ece.nwu.edu 7
Application Programming Interface • Initialization: register application, record arguments for each run • Data association: build relationship between multiple datasets • Load: find data location in the storage system from previous runs, determine best I/O calls by comparing access and storage pattern • Save: choose file names, set file views, and provide hints of optimal I/O calls • Finalization: close files and connection to the database choudhar@ece.nwu.edu 8
Implementation of the MDMS API choudhar@ece.nwu.edu 9
Meta Data Organized in Relation Database choudhar@ece.nwu.edu 10
Meta Data Management Challenge • What meta data to be collected? • For reference only or performance improvement • How to classify meta data into levels? • Levels of programming, performance, file storage, etc. • How to organize and manage meta data? • Relation tables in databases or XML files in XML database • Where to store the meta data for different levels? • Database, files, or file systems choudhar@ece.nwu.edu 11