230 likes | 360 Views
Component Infrastructure of CQoS and Its Application in Scientific Computations. Li Li 1 , Boyana Norris 1 , Lois Curfman McInnes 1 , Kevin Huck 2 , Joseph Kenny 3 , Meng-Shiou Wu 4 1 Argonne National Laboratory, Argonne, IL. 2 University of Oregon 3 Sandia National Laboratories, California
E N D
Component Infrastructure of CQoS and Its Application in Scientific Computations Li Li1, Boyana Norris1, Lois Curfman McInnes1, Kevin Huck2 , Joseph Kenny3 , Meng-Shiou Wu4 1Argonne National Laboratory, Argonne, IL. 2University of Oregon 3Sandia National Laboratories, California 4Ames Laboratory CCA meeting Jan. 2009
Outline • Motivation • CQoS introduction • Database component design • Application examples • Ongoing and future work
Overall Goals • Automate the configuration and runtime adaptation of high-performance component applications, through the so called Computational Quality of Service (CQoS) infrastructure • Instrumentation of component interfaces • Performance data gathering • Performance analysis • Adaptive algorithm support • Motivating application examples • Quantum Chemistry challenges: How, during runtime, can we make the best choices for reliability, accuracy, and performance of interoperable QC components? • When several QC components provide the same functionality, what criteria should be employed to select one implementation for a particular application instance and computational environment? • How do we incorporate the most appropriate externally developed components? (e.g., which algorithms to employ from numerical optimization components?) 3
Motivating Application Examples (cont.) • Overall simulation times for nonlinear (time-dependent) PDE-based models often depend to a large extent on the robustness and efficiency of sparse linear solvers • Properties of linear system change during runtime • No single method is best because of the complexity of long-running applications • Efficient parallel structured adaptive mesh refinement (SAMR) applications depend on load-balancing algorithms • Computational resources are dynamically concentrated to areas in need of a high accuracy • Application and computer state change at runtime • Dynamic resource allocation requires the workload partitioning algorithm be selected at runtime according to state change
Outline • Motivation • CQoS introduction • Database component design • Application examples • Ongoing and future work
CQoS-Enabled Component Application Component Substitution Set Substitution Assertion Database Component A Component B Component C CQoS Control Infrastructure Interpretation and execution of control laws to modify an application’s behavior CQoS Analysis Infrastructure Performance monitoring, problem/solution characterization, and performance model building Instrumented Component Application Cases Control System (parameter changes and component substitution) Scientist can provide decisions on substitution and reparameterization Performance Databases (historical & runtime) Interactive Analysis and Model Building Scientist can analyze data interactively
Database Needs for the Scientific Application Adaptation • Performance analysis of candidate solver/algorithm • Large number of performance runs • Store, manage, and search performance data • Store and manage hardware, compiler, and application metadata • Information essential to algorithm selection, e.g., system configurations, problem properties, application states • Optimal algorithm determination • Input data (or problem features) • Algorithmic parameters • Performance models (or hints)
Database Needs for Scientific Application Adaptation (cont.) • Database use cases: • Store historical performance data and application meta-data • Facilitate offline performance analysis • Match the current application state against historical data through DB queries during runtime • Search for optimal algorithm w.r.t. current application state • Retrieve settings associated with the optimal algorithm so we can apply it immediately to the application during runtime
Outline • Motivation • CQoS introduction • Database component design • Application examples • Ongoing and future work
CQoS Database Component Design • Designed C++ and SIDL interfaces for CQoS database management • Implemented prototype database management components • Description and software:http://wiki.mcs.anl.gov/cqos/index.php/CQoS_database_components_version_0.0.0 • Based on PerfDMF performance data format and PERI metadata formats • Comparator interface and corresponding component for searching and matching parameter sets
CQoS Database Component Design … … Perf. data: query/store Perf. Database Metadata: query/store Meta-Database Adaptive Heuristic Metadata: compare/match Meta-Comparator Perf. data: compare/match Perf. Comparator : component … … : component connection Fig.1. Connect database and comparator components to adaptive heuristics component. There can be multiple database and comparator components that deal with different data types.
Use DB interfaces in 2D driven-cavity /* instantiate parameter 1 */ ierr = ComputeQuantity(matrix,"icmk","splits",&res,&flg); CHKERRQ(ierr); MatrixProperty param1("splits", "matrix_meta", res.i); /* instantiate parameter 2 */ ierr = ComputeQuantity(matrix,"structure","nnzeros",&res,&flg); CHKERRQ(ierr); MatrixProperty param2("nnzeros", "matrix_meta", res.i); /**** Store matrix property set into database. ***/ int myRank; ierr = MPI_Comm_rank(PETSC_COMM_WORLD, &myRank); CHKERRQ(ierr); if (myRank == 0){ int localID; int trialID; string conninfo("dbname = perfdb"); /* Generate a runtime database manager. It connects to a PostgreSQL database through DB interfaces. */ RunTimeRecord *R = RunTimeRecord::instance(); R->Connect2DB(conninfo); trialID = R->getTrialID(); localID = R->getCurEvtID(cflStr); /* instantiate a parameter set */ PropertySet aSet; /* add parameter 1 and 2 into the set */ aSet.addAParameter(¶m1); aSet.addAParameter(¶m2); /* store the parameter set into database, */ R->loadParameterSet(trialID, localID, aSet); }
CQoS Performance and Metadata • Performance (general) • Historical performance data from different instances of the same application or related applications: • Obtained through source instrumentation, e.g., TAU (U. Oregon) • Binary instrumentation, e.g., HPCToolkit (Rice U.) • Ideally, for each application execution, the metadata should provide enough information to be able to reproduce a particular application instance. Examples: • Input data (reduced representations) • E.g., molecule characteristics,matrix properties • Algorithmic parameters • E.g., convergence level, maximum number of iterations • System parameters • Compilers, hardware • Domain-specific • Provided by scientist/algorithm developer
Outline • Motivation • CCA and CQoS introduction • Database component design • Application examples • Ongoing and future work
Example: CQoS in Quantum Chemistry • Initial focus: parallel application configuration of QC applications so that these can run effectively on various high-performance machines • Eliminate guesswork or trial-and-error configuration • Future work: more sophisticated analysis to configure algorithmic parameters for particular molecular targets, calculation approaches, and hardware environments 1J. Steensland and J. Ray, "A Partitioner-Centric Model for SAMR Partitioning Trade-Off Optimization : Part I," International Journal of High Performance Computing Applications, 2005, 19(4):409-422. 15
Interactions of the Quantum Chemistry Components With the Database and Comparator CQoS Components
CQoS Component Usage in Quantum Chemistry • CQoS database usage • Application metadata • Molecule characteristics: atom types, topology, moments of inertia • Algorithm parameters: tunable parameters, convergence level • System parameters • Compilers • Machine info, e.g., number of nodes, threads per node, network • Historical performance data • Execution times, etc. • Obtained through source instrumentation, e.g., TAU • Can guide configuration of related new simulations • CQoS comparator components • Compare sets of parameters within the performance database • Quantum chemistry applications can match the current application state against historical data through database queries during runtime. • Use metadata to guide parameter selection and application configuration • Match molecule similarity, basis set similarity, electronic correlation approach, etc. 18
Ongoing and Future Work (Incomplete List) • Integration of ongoing efforts in • Performance tools: common interfaces and data representation (leverage PerfExplorer, TAU performance interfaces, PERI tools, and other efforts) • Support training experiment design • To perform an empirical search for selecting the optimal solver components/parameters • Incorporate more offline performance analysis capabilities (machine learning, statistical analysis, etc.) • Apply to more problem domains, implementing extensions as necessary
Acknowledgements to Collaborators • TAU Performance Tools group, University of Oregon • Victor Eijkhout, the University of Texas at Austin • CCA Forum members • Funding: • Department of Energy (DOE) Mathematical, Information, and Computational Science (MICS) program • DOE Scientific Discovery through Advanced Computing (SciDAC) program • National Science Foundation
interface Comparator extends gov.cca.Port { /* Comparison operations between parameter sets */ void setLHS(in ParameterSet lefthand); void setRHS(in ParameterSet righthand); ParameterSet getLHS(); ParameterSet getRHS(); int getDimension(); Parameter getLHSParameterAt(in string paraName); Parameter getRHSParameterAt(in string paraName); void setToleranceAt(in string name, in double epsilon); double getToleranceAt(in string name); void setRelationAt(in string name, in int aRelation); int getRelationAt(in string name); bool doCompare(); } Main Database Component Interfaces • interface DB extends gov.cca.Port{ • bool connect(); • bool disconnect(); • bool isClosed(); • void setConnectionInfo(in string info); • string getConnectioninfo(); • int executeQuery(in string commd, out Outcome res); • void storeParameter(in int trialID, in int iterNo, in • Parameter aParam); // store a parameter into DB • void storeParameterSet(in int trialID, in int iterNo, in • ParameterSet aParamSet); // store a set of parameter • into DB • void getParameter(in int trialID, int iterNo, inout • Parameter aParam); // retrieve parameter value • void getParameterSet(in int trialID, int iterNo, inout • ParameterSet aParamSet);// retrieve parameter set value • int getMatchingTrialsBetween(in ParameterSet lower, in • ParameterSet upper, out Outcome trialIDs); // retrieve • trials from database, whose parameter set value is within [lower, upper] • int getMatchingTrials(in ParameterSet lower, in vector e • psilons, out Outcome trialIDs);// retrieve trials from database, • whose parameter set value is within [lower-epsilons, lower+epsilons] • }
1T. S. Coffey, C.T. Kelley, and D.E. Keyes. Pseudo-transient continuation and differential algebraic equations. SIAM J. Sci. Comp, 25:553–569, 2003. Database Component Usage – Example 1: 2D Driven Cavity Flow1 • Linear solver: GMRES(30), vary only fill level of ILU preconditioner • Adaptive heuristic based on: • Matrix properties (which change during runtime) computed with Anamod (Eijkhout, http://sourceforge.net/projects/salsa/)
How are the Database Components Used? • During runtime, the driver (e.g., linear solver proxy component) evaluates important matrix properties, and matches the properties to historical data in MetaDB through PropertyComparator interfaces. • Linear solver performance data is retrieved and compared given the current matrix properties. This is accomplished by the PerfComparator component. • The linear solver parameters resulting in the best performance, in this case fill level of ILU preconditioner, is returned back to the driver. • The driver adapts accordingly to continue execution.