160 likes | 176 Views
VLab is a cyberinfrastructure that facilitates complex computations of materials at high pressures and temperatures, using parameter sampling workflows. It provides a consolidated web interface with tools for data analysis, visualization, and workflow management. VLab leverages distributed computing resources for improved performance and flexibility. This system is fault tolerant and supports collaboration through shared access to resources.
E N D
VLab: A Collaborative Cyberinfrastructure for Computations of Materials Properties at High Pressures and Temperatures Cesar R. S. da Silva1 Pedro R. C. da Silveira1 Renata M. Wentzcovitch1,2 1Minnesota Supercomputing Institute, University of Minnesota 2Department of Chemical Engineering and Materials Science, University of Minnesota Work Sponsored by NSF grant ITR-0426757
“VLab is a cyberinfrastructure aimed to facilitate execution of complex calculations - mostly parameter sampling workflows - of materials at high pressures and temperatures.” Outline • Parameter Sampling Workflows - High P,T Cij as example • Basic Problem: -Job deluge • Proposed Solution: • - Features • - Performance • Overall Requirements • Workflow Support Specific Requirements • Service Oriented Architecture
Thermodynamic Method • VDoS and F(T,V) within the QHA Fitted at several temperatures either by - Vinet EOS, or - N-th (N=3,4,5…) order isothermal (eulerian) finite strain EoS
Thermoelastic constant tensor CijS(T,P) kl equilibrium structure (Pn) re-optimize
Basic Problem Demand for Extensive Parameter Sampling {Pn}x{qi} => ~102 jobs Typical High (P,T) study (ex. Thermal Properties) {Pn}x{i}x{qj} => ~103-4 jobs Huge High (P,T) study ( Cij(P,T) ) • 102-104 Jobs to prepare, submit and monitor • Manual work is prone to human errors • First Principles • => Sheer number (1015-1020) of operations (Today) • => Well over 1022 in 3-5 years - Wow can High (P,T) Materials Computations be improved?
The VLab - Consolidated Web Interface (Portal) to a set of tools: - Quantum ESPRESSO Package tools - Input preparation for pwscf, phonon, workflows, etc … - Data Analysis Tools - Visualization Tools (VTK/OpenGL) - etc. … - Workflow Management - Task Distribution and Data Recollection Leverages computing capabilities of distributed resources (TeraGrid, Compute Farms, scattered resources, other grids) Collaboration through shared access to resources
The Big Challenge of Performance • Scale-up approach is difficult • Limited number of processors in a single system • Even using the fastest vector processors is not enough • Trend is towards denser processing, not faster single-thread execution • MPP systems are not cost effective for this class of problems • FFT and matrix transposition: Limited scalability or • Low performance per processor Proposed Solution: Leveraging Concurrent Computing for features and performance High Performance Parallel Computing High Throughput Distributed Processing
Vlab - Not Just a Client/Server The Client/Server Approach: -The portal and the supporting modules have access to a large central multi-processor system. -Can work as a facilitator but lacks other important features found in VLab. -No Flexibility of Scheduling -No redundancy => Poor availability -No choice for cost (usually High)
Vlab - Not Just a Client/Server The VLab Distributed System Approach: -No central system to fail and bring everything down! • -Distributed resources are replicated for: • Redundancy • Performance • Flexibility • -More Flexible Scheduling for: • Cost • Turnaround Time • Job Throughput • Workload Balance • System Throughput
VLAB requirements • Workflow management => Facilitator • Support for distributed computations • Ease of use • Support for collaboration • Flexibility (update/add tools, new features) • Fault tolerance • Diversity of tools • analysis, visualization, data reduction, storage, etc .
VLab Workflows Typical VLab workflows, like the High-T Cij calculation involve iterations through the following steps: 1) Prepare inputs for tasks, and generate execution packages containing required files. 2) Dispatch the execution packages to compute nodes for execution. 3) Gather results for analysis and eventually iterate steps 1-3. • Results always return to the input sources • => Tree-like service architecture
VLab Service Oriented ArchitectureOn the Web: http://dasilveira.msi.umn.edu:8080/vlab/ Usage oriented view of VLab SOA => Tree-like structure in 4 layers: 1) User Interface (Portal) 2) Workflow control and monitoring (Project Executor / Interaction) 3) Task Dispatching / Interaction, task data retrieving, Auxiliary Services 4) Heavy computations and Visualization resources layer.
Fault Tolerance • Only Project Executor sessions and few user and project interaction sessions are required to be persistent. Therefore, a simple approach to Fault Tolerance (FT) is possible: • Reactive: We have not identified any need for proactive FT. • Registry Based: Persistent sessions are registered and must periodically inform the registry about its "alive" state. • Redundant Registry and Metadata DB for data persistence • Fully Journaling (data and metadata) of Critical Transactions for data and metadata integrity. This guarantee the state of any persistent session can be restored in case of failure.
Scheduling The usual approach: -Use agents that interact with the broker Problem: Agents are not stateless! -More complicated to develop -Persistence must be guaranteed The VLab approach: -Use an independent WS to monitor workload. -Persistence of data is provided by a local DB. -Compute WS and Workload Monitor are stateless!
VLab in Action Watch a demonstration movie at vlab.msi.umn.edu -> Follow the links “portal” -> “movie” • Calculation of High P,T Thermodynamic Properties • Cubic MgO • 2 atom cell • Static + Lattice Dynamics calculation {Pn}x{i} sampling • Show distributed computing capabilities • Ability to integrate visualization and data analysis tools
VLab Workflows Left: Extensive High-T Cij Right: Detailed View of Cij and phonon