220 likes | 409 Views
AstroGrid-D Monitoring. AstroGrid-D Meeting @ AIP 25.-26.2.2008 Frank Breitling Stephan Braune. Contents. Host Monitoring (for compute resources) Status Goals until the end of the project Perspectives beyond the project Robotic Telescope Monitoring Status
E N D
AstroGrid-D Monitoring AstroGrid-D Meeting @ AIP 25.-26.2.2008 Frank Breitling Stephan Braune
Contents Host Monitoring (for compute resources) Status Goals until the end of the project Perspectives beyond the project Robotic Telescope Monitoring Status Goals until the end of the project Perspectives beyond the project
Host Monitoring Status Since Dec. 2007 AGD monitoring solution It builds on Audit Logging provided by Globus Toolkit V4.0.5 and later PostgreSQL Database (DB) DB Triggers Usage Records (UR) XML format (http://staff.psc.edu/lfm/PSC/Grid/UR-WG/) XML2RDF XSLT Stellaris SPARQL queries A test setup is running at the AIP since Dec. 2007
AGD Monitoring Architecture globusrun_ws GlobusClient Globus grid Resource AuditDatabase globus_job_run Trigger User Workstation curl Stellaris RDF- Database Earlier: status information via EPR-files and monitoring.pl Browser SPARQL QueriesTimelines
Activation of Audit Logging in Globusfor WS GRAM (globusrun-ws) Changes in the Globus Toolkit configuration: in $GLOBUS_LOCATION/container-log4j.properties: ... # GRAM AUDIT log4j.category.org.globus.exec.service.exec.StateMachine.audit=DEBUG, AUDIT log4j.appender.AUDIT=org.globus.exec.utils.audit.AuditDatabaseAppender log4j.appender.AUDIT.layout=org.apache.log4j.PatternLayout log4j.additivity.org.globus.exec.service.exec.StateMachine.audit=false output to database (PostgreSQL or MySQL), Database Connection has to be declared in $GLOBUS_LOCATION/etc/gram-service/jndi-config.xml: <resource ...> <resourceParams> ... <parameter> <name>url</name><value>jdbc:mysql://<host>[:port]/auditDatabase</value> </parameter> <parameter><name>user</name><value>globus</value></parameter> <parameter><name>password</name><value>foo</value></parameter> ... </resourceParams> </resource> table update whenever a job ist started or changed it's status (contrary to SAGAS) database content is converted into Usage Record format and sent to Stellaris via DB triggers
Activation of Audit Logging in Globusfor Pre WS GRAM (globus-job-run) Changes in the Globus Toolkit configuration: in $GLOBUS_LOCATION/log4j.properties: ... # GRAM AUDIT log4j.category.org.globus.exec.service.exec.StateMachine.audit=DEBUG, AUDIT log4j.appender.AUDIT=org.globus.exec.utils.audit.AuditDatabaseAppender log4j.appender.AUDIT.layout=org.apache.log4j.PatternLayout log4j.additivity.org.globus.exec.service.exec.StateMachine.audit=false text file output has to be configured in $GLOBUS_LOCATION/etc/globus-job-manager.conf: -audit-directory /tmp/globus file is converted into Usage Record format and sent to Stellaris via a cron job
DB Trigger The triggers are installte in the PostgreSQL DB using: audit=# \i trigger.sql Documentation is available at AGD intranet: http://mintaka.aip.de:8080/lenya/intranet/live/workpackages/wg2/GRAM_audit_logging.pdf CREATE FUNCTION update_stellaris() RETURNS "trigger" AS $update_stellaris$ use strict;use URI;use Net::hostent;use XML::Writer;use HTTP::Request;use LWP::UserAgent; my $job_grid_id = URI->new($_TD->{new}{job_grid_id}); my $id = unpack("H*", $job_grid_id->query()); my $host=gethost($job_grid_id->host())->name(); my $usage_record = ""; my $writer = XML::Writer->new(OUTPUT => \$usage_record, NEWLINES => 1, UNSAFE => 1); $writer->xmlDecl("UTF-8"); $writer->startTag("JobUsageRecord", "xmlns" => "http://www.gridforum.org/2003/ur-wg#", ...); $writer->startTag("RecordIdentity"); $writer->dataElement("LocalJobId", $_TD->{new}{local_job_id}); $writer->endTag("RecordIdentity"); ..... $writer->raw($_TD->{new}{job_description}); $writer->dataElement("success_flag", $_TD->{new}{success_flag}); $writer->dataElement("finished_flag", $_TD->{new}{finished_flag}); $writer->endTag("JobUsageRecord"); $writer->end(); my $req = HTTP::Request->new("PUT", "http://stellaris.astrogrid-d.org/files/hosts/".$host."/urs/".$id, HTTP::Headers->new(Content_Length => length($usage_record)), $usage_record); my $ua = LWP::UserAgent->new(); my $res = $ua->request($req); ..... return; $update_stellaris$ LANGUAGE plperlu; CREATE TRIGGER update_stellaris_trig BEFORE INSERT OR UPDATE ON gram_audit_table FOR EACH ROW EXECUTE PROCEDURE update_stellaris();
SPARQL Queries for Usage Statistics PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ur: <http://www.gridforum.org/2003/ur-wg#> PREFIX x2r: <http://www.astrogrid-d.org/2007/08/14-xml2rdf#> SELECT ?job_grid_id ?GlobalUserName ?SubmitHost ?executable ?creation_time ?StartTime ?EndTime ?wdv ?Count ?CPU_Time WHERE { graph ?g { ?n1 ur:JobIdentity ?JobIdentity . ?JobIdentity ur:job_grid_id ?job_grid_id . ?n1 ur:UserIdentity ?UserIdentity . ?UserIdentity ur:GlobalUserName ?GlobalUserName . ?n1 ur:creation_time ?creation_time . ?n1 ur:SubmitHost ?SubmitHost . OPTIONAL { ?n1 ur:StartTime ?StartTime . ?n1 ur:EndTime ?EndTime . } OPTIONAL { ?n1 ur:WallDuration ?wall_duration . ?wall_duration x2r:value ?wdv . } OPTIONAL { ?n1 ur:Resource ?res . ?res x2r:value ?executable . } OPTIONAL { ?n1 ur:Count ?Count . } OPTIONAL { ?n1 ur:CPU_Time ?CPU_Time . } }} ORDER BY DESC(?creation_time) LIMIT 25 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ur: <http://www.gridforum.org/2003/ur-wg#> PREFIX x2r: <http://www.astrogrid-d.org/2007/08/14-xml2rdf#> SELECT distinct ?GlobalUserName ?executable ?SubmitHost sum(?CPU_Time) WHERE { graph ?g { ?n1 ur:JobIdentity ?JobIdentity . ?JobIdentity ur:job_grid_id ?job_grid_id . ?n1 ur:UserIdentity ?UserIdentity . ?UserIdentity ur:GlobalUserId ?GlobalUserName . ?n1 ur:SubmitHost ?SubmitHost . ?n1 ur:CPU_Time ?CPU_Time . OPTIONAL { ?n1 ur:Resource ?res . ?res x2r:value ?executable . } }} ORDER BY ?GlobalUserId
Goals until the end of the project Integrate monitoring info in Timeline and Resource Map Provide more SPARQL query templates (See svn://svn.gac-grid.org/software/monitoring/host/) Provide improved documentation and installation instructions Include all AGD institutes and resource in monitoring Come from test to production mode, i.e. solve remaining problems
Solve instable DB connection Audit Logging establishes a DB connection only once, i.e. the first time a job is submitted to Globus If the DB goes down, the connection is lost and no further data received => a restart of the Globus Container necessary Solution: we have informed the GT developers via mailing lists: gt-user & gram-user but report: http://bugzilla.globus.org/globus/show_bug.cgi?id=5863
Add missing fields in audit logging Some important information is not provided by audit logging global job id (UUID format) resource usage information as reported by the UNIX time command, i.e.: (i) the elapsed real time (ii) the user CPU time (iii) the system CPU time end time of the job, in the same format as creation_time name of submission client name of execution host (and maybe also the number of used CPUs) Solution: we have informed the GT developers via mailing lists: gt-user & gram-user but report: http://bugzilla.globus.org/globus/show_bug.cgi?id=5864
Add Usage Record (UR) format Audit logging is not compatible to the UR format, the OGF standard for monitoring information currently we construct URs via database triggers Solution: we have informed the GT developers via mailing lists: gt-user & gram-user but report: http://bugzilla.globus.org/globus/show_bug.cgi?id=5865
Simplify installation procedure Currently the PostgreSQL has to be recompiled with Perl support DB triggers have to be installed Globus configuration is necessary Solution: we want to optimize the installation process, maybe with a Globus helper package
Upgrade to Stellaris V 0.2.0 Currently a few problems also exist with Stellaris V 0.2.0 We continue testing and Report every problem to Mikael Högqvist
Perspectives beyond the project Define a common policy about data privacy, since AGD resources are shared with other grid communities (e.g. LRZ) which might have different restrictions on logging of user information Suggest AGD monitoring solution to other grid communities
New vision of the RT project as reflected by new name: OpenTel corresponding project page: http://www.gac-grid.org/project-products/RoboticTelescopes.html OpenTel is an open network for rob. telescopes. Open means open standards open source open for telescopes to join OpenTel is for professional and amateur astronomers OpenTel is currently the only open network and therefore a unique and promising approach in robotic astronomy Robotic Telescopes Status
Project History Progress so far D2.4 Static metadata: FB, done (15.5.2007) D2.7 Dynamic metadata / Monitoring: FB, 66% complete, publication expected in March D5.3 First Integration of RTs: FB, done (31.7.2007) Goals until end of project D5.5 Resource Broker: TR, work in progress. FB will help. D5.8 Scheduler: FB, TR, Thomas G., to be done
Monitoring / Dynamic Metadata Monitoring a network of robotic telescopes - Deliverable 2.7: STELLA-I & II as info providers for Stellaris Same database triggers as for host monitoring RDF Calendar format is used for scheduling info (understood by RDF tools) Trigger templates can be easily adjusted for other telescopes Software is collected in a package called “ottools” Timeline showing observation schedule directly from the STELLA DB (http://photon.aip.de:25000/timeline/telescopes.html) Timeplot showing weather information (tbd)
Goals until the end of the project Provide a general solution for the integration of other telescopes. This requires: Metadata management based on user certificates Software package with tools and templates (ottools) svn://svn.gac-grid.org/software/OpenTel/ottools Comprehensive documentation Improved user interfaces: Timeline & Timeplot with menu for selection of telescopes, time windows, etc. Timeplot displaying new metadata of time series (temperature, seeing, etc.) Resouce map displaying dynamic metadata Resource Broker (D5.5) Scheduler (D5.8) Integrate STELLA-I & STELLA-II First observation via the grid
Perspectives beyond the project Improve software, in particular the scheduler Perform more grid observations, more testing Perform first network observations Integrate more telescopes, in particular from hobby astronomers. Software contributions would be welcome Collaboration with other networks such as the LCOGT Attract and collaborate with the amateur astronomy and open source community Find an OpenTel logo