400 likes | 662 Views
Collaborative Problem Solving Environments. Deborah K. Gracio Computational Sciences & Mathematics Pacific Northwest National Laboratory. Collaborative Problem Solving Environments.
E N D
Collaborative Problem Solving Environments Deborah K. Gracio Computational Sciences & Mathematics Pacific Northwest National Laboratory
Collaborative Problem Solving Environments To provide research, development and integration of data, information, scientific tools and technologies to enable researchers to transcend current methods for solving complex problems. Research Areas Include: • Problem Solving Environments • Collaborative Technologies • Dynamic Data and Metadata Management • Data Discovery and Mining • Information Services • Work Analysis and Participatory Design • Scientific Workflow • Distributed and GRID Computing 2
Objective of a Collaborative ProblemSolving Environment • A problem solving environment (PSE) is an integrated software environment used to solve a particular computational problem or class of related problems • A PSE integrates methods, tools, information and computing resources to support the modeling and simulation of complex scientific problems • PSEs capture and automate the routine part of research procedures while facilitating the unique aspects of individual computations or experiments • CPSEs provide an environment in which researchers can perform their work without regard to geographical location - interacting with colleagues, accessing instrumentation, sharing data and computational resources, and accessing information 3
Challenges Facing Science • Data Overload – Assimilation of massive amounts of data and/or information from heterogeneous sources into customized views for users • Integrating System Architectures – Development of software architectures to enable users to efficiently take advantage of the masses of data and the technology advances, spanning the spectrum from hand-held devices to desktops to supercomputers • Multi-disciplinary Knowledge Transformation – Integrating data, computational results, information, and analytical tools across multi-disciplinary domains • Distributed System Technologies – Discovery, management and administration of distributed resources (computers, data, computational results, people, web sites, etc.) • Collaborative Technologies – Interconnecting people, facilities and information into a virtual presence Getting the Right Information, in the Right Form, to the Right People, at the Right Time! 4
Concepts for Building CPSE’s • Component based architectures to support the construction of CPSEs in a variety of domains. • End-user extensibility - key to long-term success of the domain specific CPSEs. • Scientific workflow provides an environment in which the domain specialist can carry out a discipline specific research process in pursuit of an objective or goal. • Extension of the traditional definition of resources (computers and people) is needed: • collaborative working session • software applications • arbitrary services providing global lookup • User tasks • Collaborators • Scientific records 5
Architecture of CPSE C L I M A T E M A N U F A C T U R I N G C H E M I S T R Y B I O L O G Y E N G I N E E R I N G Scientific Domains Problem Solving Computation Management Records Management Decision Support Work Flow Distributed OS Support Distributed Data Management Distributed Messaging Collaborative Technologies Resource Management Registry Service Security Model Execution Remote Access Computational Grid 6
Acquisition Transformation Assimilation Knowledge Scientific Discovery Goal: Leverage Research Across Programs Data Warehousing Decision Support Electronic Notebooks Data Management Visualization Remote Communications Data Acquisition Data Mining Collaboration Resource Discovery Security Distributed Messaging Information Services Knowledge Engineering Workflow 7
Benefits of PSEs to Scientists • Integrates the key activities of scientific research, from problem definition, research design, experiment execution, and analysis • Allows scientists to efficiently execute their computational models over a distributed network • Integrates the scientist’s processes, data, and resources into a common working environment • Guides scientists in the research and experimentation process • Allows scientists to share their knowledge and expertise in their specific domains • Reduces barriers to collaboration among scientists who are geographically dispersed 8
Key PSE Components • An intelligent graphical user interface to support scientific activities • Integrated modeling, visualization and analysis capabilities • Tools for launching and monitoring applications across a distributed computing network in real-time • Integrated data management capabilities • Tools to discover data and information across distributed systems • Tools for ensuring efficient use of computing resources • Integrated security for distributed data, communications, and computing • Integrated tools to share work with collaborators 10
The Extensible Computational Chemistry Environment: A Problem Solving Environment for High Performance Theoretical Chemistry 11
Why Ecce Was Developed • Developed as part of the construction of the Environmental Molecular Sciences Laboratory (EMSL) • Envisioned to be used as an integrated component in solving DOE’s grand challenge environmental restoration problems • Extensible framework supporting development and use of new computational methods • Capability to efficiently use computational resources that are available 12
Ecce is… • 12 interconnected tools for setting up, running, and analyzing results from electronic structure computational chemistry codes • Developed primarily in C++ employing object oriented design methodology • Version 3.1 released Spring 2003 • Organization: Gateway, Calculation Manager • Setup: Molecule Builder, Basis Set Tool, Calculation Editor, Periodic Table • Launching: Job Launcher, Machine Browser, Machine Registration • Monitoring/Analyzing: Calculation Viewer, eccejobstore, eccejobmonitor 13
... MM MD QM/MM DFT HF MP(2-4) MCSCF CCSD(T) MRSDCI ... Scaling O(N) O(N7) Accuracy 100 kcal 0.1 kcal Size 107 atoms 10 atoms 10-15 - 10-8 s 10 000 bf 3 000 bf 1 000 bf 500 bf 1 000 000 atoms 1 000 atoms 300 atoms 100 atoms 5 000 000 CSFs 10 - 20 atoms Current MS3 Capabilities 14
Ecce Design Objectives • Provide a bundled set of chemistry tools and applications integrated in a windowing environment with a common look-and-feel. • Construct a software architecture that supports incorporation of new commercial or public technologies and applications. • Provide domain-specific data models for EMSL generated data to improve data sharing among scientists and applications. • Support existing and newly developed "Legacy" style codes (i.e. codes that interface via specialized input and output files) • Improve the efficiency of performing chemistry experiments through the integration of steps in the scientific process. • Support a distributed computing environment that includes a varied set of local and remote computers. • Provide access for non-resident researchers to EMSL resources and access to EMSL applications. 15
Ecce Simplified Architecture Ecce application software workstations Chemistry code compute resources Ecce data and message servers 17
Ecce Development Environment • Non-proprietary technologies used for external development • C++ compiler—GNU g++ • X Window System Motif toolkit • OpenGL/Open Inventor visualization • WebDAV/XML (Xerces parser) data management • Perl scripting language • Amulet code registration GUI toolkit • Expect C/C++ library for wrapping remote communications • External users wishing to add core applications would develop with these non-proprietary technologies linking with existing Ecce “middleware” libraries • Java, Python, Tcl/Tk, C, etc. core application development also possible because of WebDAV/XML and new IPC backplane • Existing core GUI applications can not be modified externally without purchasing the GUI development tool we use—XRT. 19
Ecce Software Engineering Process • Research including work in: • New visualization techniques • Workflow and usability • Resource management • Data Management Requirements Definition and Analysis Phase Design and Prototype User feedback of enhancements & bugs Build and Unit Test Usability Testing Acceptance Test and Validate Incremental Delivery of Software 20
User Support Model MSCF Ecce User Off Site Ecce Installation Off Site 1st Line Support www or java www or java MSCF Scientific Consulting ecce-support@emsl.pnl.gov www page for problem reporting Ecce Group 21
More Information • http://www.emsl.pnl.gov/pub/docs/ecce • Register to use Ecce (site agreement required externally) • Installation and administration manual • Compute server registration documentation • Code registration documentation • Online help access • ecce-support@emsl.pnl.gov 22
Collaboratory for Multi-Scale Chemistry: A Problem Solving Environment developed using Portal technologies 2 23
Collaboratory for Multi-scale Chemical Science (CMCS) • A collaboration of eight national labs and universities • Chemical scientists spanning the scales from electronic structure of molecules to simulations of reacting flow • Computer and information scientists expert in emerging web-based technologies • Funded by DOE/SC MICS office • Part of the National Collaboratory Program • Pilot project within DOE combustion research community • Targets Chemical Science Community and BES SciDAC projects with much broader goals in the longer term 24
The Multi-scale Challengefor Chemical Science • Impact of chemical science relies upon flow of information across physical scales • Data from smaller scales supports models at larger scales • Critical science lies at scale interfaces • Molecular properties, transport • Mechanism validation, reduction • Chemistry – fluid interactions • The pedigree of information matters • The propagation of data pedigree across scales is difficult • Validation and data reliability is often a post-publication process • Multi-scale science faces barriers • Normal publication route is slow • Numerous sub-disciplines employ different applications, formats, models • Centers of excellence are geographically distributed 25
CMCS Objectives • Architect and build an adaptive informatics infrastructure enabling multi-scale science • XML data/metadata management services • MCS Portal enabling data-centric project and community collaboration • Middleware and tools for security, notification, collaboration • Pilot project within combustion research community • Enable rapid exchange of multi-scale data/metadata • Integrate scientific tools that generate, use and archive metadata • Demonstrate the power of adaptive infrastructure to existing and new areas as CMCS evolves • Development environment for an evolving set of collaborative cross-scale science tools • Develop collaborative data pedigree/annotation tools • Gain adoption and continued support by science community • Publicly accessible data • Enable external research collaborations • Document success and continuation path • Broaden capabilities, and extend to new scientific communities 26
CMCS: Technologies Used • Portal • Jetspeed http://jakarta.apache.org/jetspeed/ • Chef http://www.chefproject.org/ • Data Middleware • SAM http://www.scidac.org/SAM/ • WebDAV http://www.webdav.org/ • Slide http://jakarta.apache.org/slide • XML, XSLT • Binary Format Dscription (BFD) and Extensible Scientific Interchange Language (XSIL) • Pedigree • Dublin Core standards http://dublincore.org • CMCS standard and experimental schemas. • Notification • Java Messaging System (JMS) currently using OpenJMS http://openjms.sourceforge.net/ • Security • Targetting Java Authentication and Authorization Service (JAAS). Currently using Chef security for portal and Slide security for SAM wrappered in JAAS objects. • Miscellaneous • Java 1.4, Ant 1.5, CVS 28
Collaboration Technologies: Electronic Laboratory NotebooksVirtual Experiments 2 29
Email Newsgroups Calendars File systems Today’s Collaborative Technologies Electronic Notebook Real-Time Collaboration Multi White Board Audio/Video Conferencing Shared Window Chat Box Remote Instrument Shared Browsers Group Authoring Voting Tools Remote Camera And Analysis 30
Electronic Laboratory Notebook • Same concept as a paper laboratory notebook • Each notebook page provides a data file, an image, or a live Java-based graphical summary • Information in the notebook is updated via users, or automatic updates from on-line instruments 31
The Computational Cell Environment: A Problem Solving Environment to enable computational biology research 2 33
Goals of Computational Cell Environment • Provide a single location from which to access the wide range of data needed for biological research • Provide management of subsets of data allowing researchers to define a set of interest that can be sent to analysis tools, sub-setted, extended, or refined with research annotations or conclusions • Support federated searches across multiple databases • Capture the analysis thought process and conclusions • Develop machine-computable representations of data pedigree • Provide tools for manual and automated traversal of pedigree to support group and community data validation efforts and exploration of sensitivity analyses • Allow arbitrary analysis notes and comments to be attached to data values 34
Workbench 36
Problem Solving Environments to support climate modeling research 2 38
Strategy for Success • Assemble a multi-disciplinary team of scientists to work together to solve complex scientific problems • Provide a framework and set of high-throughput tools for tackling complex problems • Design a component-based architecture that is extensible and will support scalability and growth • Build a program that feeds the continued evolution of the software components, leveraging across multiple projects 40
Acknowledgements Funding for this work is provided by the U.S. Department of Energy Office of Biological and Environmental Research (OBER) and the Office of Advanced Scientific Computing Research (OASCR). Parts of this research was performed using the Molecular Science Computing Facility (MSCF) in the William R. Wiley Environmental Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL). The MSCF is funded by the OBER. PNNL is operated by Battelle for the U. S. Department of Energy under contract DE-AC06-76RLO 1830. 41
Contact Information For more information: Debbie Gracio Computational Sciences and Mathematics Pacific Northwest National Laboratory Email: debbie.gracio@pnl.gov Phone: 509.375.6362 Fax: 509.375.6631 42