610 likes | 620 Views
AstroGrid-D is a project focused on building a grid infrastructure for astronomy research, with goals to integrate compute and data resources, manage metadata, and provide distributed query capabilities. The project has a history of using Grid infrastructure and supports Grid computing requirements. The architecture of AstroGrid-D allows applications to be Grid-enabled without awareness of the underlying machines, and infrastructure thorns can be enhanced to optimize Grid usage. The project also tackles specific use cases such as Dynamo, NBody6++, GAT, LOFAR/GlowGrid, and Robotic Telescopes Grid. Challenges in the project include resource integration, data management, and distributed job supervision. The consortium includes several associated partners and work packages are focused on various aspects of resource integration, metadata management, and user interfaces. The architecture of AstroGrid-D incorporates well-defined metadata schemes, including GLUE schema, RTML, and scientific/application-specific metadata. Metadata representation is done using RDF, with access via SPARQL query language.
E N D
Overview • AstroGrid-D • History • Project Structure • Goals • Current status • Architecture • Resource Integration • Services: • Information Service (MetaData Management) • Database and DataStreams Management • Selected UseCases • Dynamo • NBody6++ • GAT • Challenges • LOFAR / GlowGrid • Robotic Telescopes Grid • AstroGrid-D and GAVO
History: About Cactus • Cactus and its ancestor codes have been using Grid infrastructure since 1993 • Support for Grid computing was part of the design requirements for Cactus 4.0 (experiences with Cactus 3) • Cactus compiles “out-of-the-box” with Globus [using globus device of MPICH-G(2)] • Design of Cactus means that applications are unaware of the underlying machine/s that the simulation is running on … applications become trivially Grid-enabled • Infrastructure thorns (I/O, driver layers) can be enhanced to make most effective use of the underlying Grid architecture T.Radke
History : Simulation • SC99 • Colliding Black Holes using Garching, ZIB T3E’s, with remote collaborative interaction and visualisation at ANL and NCSA booths • 2000 • Single simulation LANL, NCSA, NERSC, SDSC, ZIB, Garching, … Numerical relativity: AEI tangential collision of two black holes
Potsdam/Berlin Heidelberg Garching AstroGrid-D Consortium Associated Partners • UP • MPIA • MPIfR • LRZ • RZG • USM • FZK Consortium • AIP (Lead) • AEI • ZIB • ZAH • MPA • MPE • TUM
AstroGrid-D Work packages • Integration of compute and data resources • Managing and providing metadata • Distributed management of data (files) • Distributed query of data bases and management of data streams • Supervision and interactive management of grid jobs • User and application programmer interfaces
Resource Integration • Building Grid Infrastructure: • Unified configurations based on GTK 4.x • Compiled, no binary installation required for MPI (mpichG2) • Each resource acknowledges GermanGrid and GridGerman Root CA (D-Grid StA recommendation) • VOMRS, registration of access to resources • Each institute has a Registration Authority • Development of translation from MDS information to AstroGrid-D Information Service for Job-Submisson, Brokering, VO Management • Globus MDS is enhanced by Ganglia • Firewall configuration according to D-Grid-StA recommendation
Certification / RA / Usermanagement • Certification / RA • RA: • MPA/MPE: RA via RZG (DFN) • TUM: -> (DFN) • AIP : RA established (GridKA) • AEI : RA established (GridKA) • ZIB : RA established (DFN) • ZAH: RA established (GridKA) • Usermanagement • Adopted the gridpool approach (gLite) Grid-users are created in pools, allowing of an easy mapping for VO to (unix) groups, for accounting and tuning of local resource access policies
VO Management • Virtual Organisations • D-Grid management tool (VOMS) is only available with a nearly fully fledged gLite-Installation • Most Communites base their Grid environment on Globus • VOMRS is derived from VOMS, but is easily compatible with GTK • Benefits of VORMS: • Frontend: Java-webapp with https, uses certificates • Enhanced facilities for VO-Group and User-Management, extensible to registering resources and their policies • SOAP clients builtin • Backend for customized queries of the database • AstroGrid-D has a grid-map merge procedure allowing to fine tune the grid-maps for local resource providers
AstroGrid-D Metadata • Resources (computational, storage, robotic telescopes, instruments, network, software) • Well-defined schemes • GLUE schema (computational, storage, software) • RTML (robotic telescopes) • State of grid services (jobs, files, data stream, ...)
Metadata (cont.) • Scientific metadata (domain-specific description of data sets, provenance) • Application-specific metadata (job history, job progress, ...) • Schemes will be decided later by the community
Requirements • Extensible/flexible schemes • Easy to extract and export metadata • Protect from unauthorized access • Support exact match and range queries • Handle different metadata characteristics
Approach • Metadata representation with RDF • An RDF entry is a (subject, predicate, object)-tuple • A set of triples/statements form a graph • Metadata access via SPARQL • Query language for RDF • Graph pattern matching • Simple interface including add, update, remove and query
RDF Example Photographer “A picture of the Eiffel tower has a photographer with value Maria” “Maria”
RDF Example “A picture of the Eiffel tower has a photographer with value Maria” “A picture of the Eiffel tower has a photographer named Maria” Photographer “Maria” “Maria” “Maria has a phone number with value 555-444” Phone number 555-444
RDF Example “A picture of the Eiffel tower has a photographer with value Maria” “A picture of the Eiffel tower has a photographer named Maria” Photographer “Maria” “Maria” “Maria has a phone number with value 555-444” Phone number 555-444 “A picture of the Eiffel tower has creation-date with value 2003.06.05” Photographer “Maria” Creation-date Phone number 2003.06.05 555-444
SPARQL Example Photographer “Get the name and phone number of the photographer who took the picture of the Eiffel tower” “Maria” Creation-date Phone number 2003.06.05 555-444 SELECT ?name, ?number WHERE {“Picture of Eiffel tower” “Photographer” ?name . ?name “Phone number” ?number } Input graph Number Name Output results 555-444 Maria
Storage interface • Add(String RDF, String context) • Update(String RDF, String context) • Overwrite matching statements • Remove([statements], String context) • Delete existing metadata part of the information service storage • Query(String sparql_query) • Extract metadata from the information service or RDF information producers
Demo: Overview • Idea: use Google's map API to present grid resources using RDF metadata provided by the information services • Tools • MDS4 WebMDS produces an XML representation of resource information • Template language or XSLT for translating to RDF • RDF store • Web service interface to add and query the RDF store
Demo: SPARQL queries “Get all computing elements from site S” SELECT ?ce WHERE {S “ComputeElement” ?ce} “Get all sites and their longitude and latitude if available” SELECT ?site, ?lat, ?long WHERE {?site rdf:type “Site” . OPTIONAL {?site geo:lat ?lat . ?site geo:long ?long } }
Demo: RDF graph (example) Lat <SiteID> 51.30 Cluster Long <ClusterID> 10.25 ComputeElement <Compute ElementID> #Running jobs #CPUs Name #FreeCPUs 4 8 default 2
UseCase Dynamo Modelling the turbulent dynamo in planets, stars and galaxies (solving the induction equation with turbulent electromotive force) Moderate Requirements: • some MB to few GB main memory and <= 10 GB of local disk space • Runtime: hours to days, Fortran90 code / compilers • Handling of plain files only • Concurrent start of multiple simulations with different input parameters
Preparation of Taskfarming • One directory for each model (parameter sets) containing input parameter files and the executable or code and Makefile • Job Description (Globus XML file for multijobs) • data location (input) • FQDN names of compute resources • data location (output)
Multi-Job XML .....
Start Taskfarming • Single sign-on to grid $>grid-proxy-init • Submit Multi-Job $>globusrun-ws –submit -b –f job.xml \ –J –S –o epr.xml • Status requests & monitoring: $>globusrun-ws –status –j epr.xml globusrun-ws –monitor –s –j epr.xml
Multi-Job Description XML XML Job EPR User Request Job Monitoring grid- proxy- init Submit Multi- Job Request Job Status Multi-Job Description Job EPR Grid Resource Handle Status Info Provide Monitoring Data Accept Multi- Job Dele- gate 1. Job Dele- gate N. Job Single-Job Descriptions Input Data 1 Output Data 1 Task Grid Resource Accept single Job File Stage-In Execute Task File Stage- Out Clean Up Input Data N Output Data N Send Files Task Grid Resource File Stage-In Execute Task File Stage- Out Clean Up Accept single Job Send Output Files Request Files Save Output Data Save Output Data Deliver Data & Task Deliver Data & Task Grid Resource Input Data 1 Task Output Data 1 Output Data N Input Data N Multi-Job Flowchart
JDSL and RSL • Why consider another Job Description Language ? • Globus RSL, which is processed by GRAM, lacks some features, which are required for a detailed description of many jobs • Simulations require often the compilation before the actual job is run • Datamining, where the first stage of the job has to acquire the actual data to operate on from remote archives/databases • Defining workflows is next to impossible
JDSL and RSL • JDSL is developped by a GGF working group • AstroGrid-D started to develop a JSDL to RSL translator to meet the requirements of our usecases • Commandline Interface • takes XML-file as input • produces RSL-file for globus-run command $>jsdlproc jsdl-examples/nbody6.jsdl > temp.rsl &&\ globusrun-ws -submit -staging-delegate -job-description-file temp.rsl -F < globushost >
GAT (Grid Application Toolkit) • GAT is not a new grid middleware • GAT is a framework to enable an uniform access to different grid middleware GAT enables an easy access to different grid infrastructures • GRAM • Condor • Unicore • GridFTP • ... GAT is an OpenSource project
GAT (Grid Application Toolkit) • Applications call the GAT-API for grid operations • Applications have to be linked against the GAT • Applications are independend of the underlying grid infrastructure • GAT Engine loads available adaptors at runtime • During a calls of the GAT-API the GAT engine decides, which adaptor performs the grid operation • In error case during a „grid operation“, another adaptor will be accessed • default adaptors offer local operations • grid applications can be compiled, linked and tested without available grid services • The same application can run in a „complete grid environment“: without a new build
GAT (Grid Application Toolkit) • The GAT-API doesn‘t change. Changes e.g. in Globus job submit are included in the GAT Globus adaptor • GAT offers reliability. If one grid service is not available, another grid service is used • GAT is much eaysier to install than Globus • GAT has now a Java-API, allowing for pure client-installations in combination with JAVA-COG (AstroGrid-D development) • GAT offers grid with a minimum effort for the end user
GAT (Grid Application Toolkit) • GAT doesn‘t replace any function of a grid middleware • Without an adaptor for a grid middleware GAT is useless for this middleware • GAT has no resource broker
Application layer Application GAT layer GAT API GAT Engine GAT Adapter GTK4 PBS SGE Globus 2/3.x DRMAA Unicore GAT (Grid Application Toolkit) User Space „Grid“ Space
Robotic Telescope Grid Arizona Potsdam Hawaii Texas La Palma Tenerife Australia South Africa AIP Liverpool JMU U Göttingen T. Granzer • 24h-observation, no reaction delay. • Independent of local weather. • Use GRID-technology in an uplifted abstraction layer for robotic telescopes • IVOA (VOEvent) and HTN (Heterogeneous Telescopes Network)
LOFAR/ GlowGrid • GLOW: German LOng Wavelength Consortium • represents the interests of the german astronomy community in the LOFAR project • AIP, MPA and MPIfR are members of GLOW • Goals of GlowGrid: • Grid methods and infrastructure for radioastronomy community • Developing grid based methods for “real time” processing of antennae signals • Long term storage for LOFAR/GLOW data
LOFAR/ GlowGrid • Partners: AIP, MPA, MPIfR, ZIB, FZJ, KIP • TLS, RUB, ZAH, TUM • Volume 5 FTE x 3 years + hardware • Project is placed as community project, but AstroGrid-D will be managing the project and the workpackages are added to our current structure
LOFAR Bremen 75 50 25