130 likes | 238 Views
DAR Metadata Catalog. Markus Heene, DWD markus.heene@dwd.de. Agenda. Welcome Notes Performance Test - Infrastructure High level architecture Geonetwork terraCatalog Performance Tests Requirements Preconditions Results Remarks Resources. Notes.
E N D
DAR Metadata Catalog Markus Heene, DWD markus.heene@dwd.de
Agenda • Welcome • Notes • Performance Test - Infrastructure • High level architecture • Geonetwork • terraCatalog • Performance Tests • Requirements • Preconditions • Results • Remarks • Resources
Notes • The presented results are from May 2009 • Both software solutions have released newer versions • Geonetwork 2.6 • terraCatalog 3.0 • The findings of the Performance Study were made available to both
Performance Test - Infrastructure Tomcat 5.5 Test client Oracle 10g Application Server: CPU: 4 AMD Opteron 1800 MHz RAM: 9186716 kB
Geonetwork: High level architecture • Geonetwork (version 2.2 and 2.4) • Servlet Container • Main development for jetty (migration to other Servlet containers like Tomcat, OC4J possible) • Geonetwork consists of 3 different web applications which could interact • Different Frameworks used for the development: Jeeves, Struts, Spring, … • For the next generation of Geonetwork a system architecture redesign is announced: remove Jeeves Framework (“Bringing data and metadata closer together”, FOSS4G2008 - Cape Town by Jeroen Ticheler) • Metadata handling • Metadata XML file is stored as “large object” in Database (support for different vendors) • Search is mainly based on lucene index outside of Database • <gmd:fileidentifier> limited to varchar2(250) in basic installation • Huge time necessary to build lucene index • Additional remarks • Open source software • Stable solution so far (migration to other Servlet container needs time) • Version 2.2 implements only some queries of CSW • Some Z39.50 support is available, currently only limited experiences inside DWD • Production installation with up to 25.000 records are running (what we found)
terraCatalog 2.3: High level architecture • terraCatalog 2.3 • Servlet Container • Main development for Tomcat (migration possible but not tried) • terraCatalog consists of different web applications which could interact • Consistent usage of frameworks through all web applications • Metadata handling • Metadata XML file is stored in Database and “mapped” into relational model (database support for Postgresql and Oracle) • Search is function of Database (Oracle Spatial and Text) • Mapping into relational model cause conflicts with XML documents (e.g. title is limited to varchar2(255), same for abstract and keywords) valid ISO-conform XML documents could not be imported into terraCatalog • Oracle Spatial datatype could store only half of the world special treatment necessary for whole globe we found Oracle errors in certain situations • Additional remarks • Commercial software with support • Much more complete implementation of CSW compared to Geonetwork 2.2 • No Z39.50 search functionality additional investment necessary • Production installation with up to 25.000 records are running • We found some bugs – SQL Injection, Oracle errors, import of valid XML documents not possible, error in export metadata as XML document
Performance Tests - Requirements • Requirements based on WMO and INSPIRE • WMO (see WIS-TechSpec-8, DAR Catalogue Search and Retrieval, Technical Specification 1.1) • Response time < 2 sec • 40 combined search (keyword and bounding box) per second • Minimum of 20 active sessions • INSPIRE • Response time < 3 sec • Minimum of 30 active sessions • DWD • Minimum of 100.000 metadata records
Performance Tests - Preconditions • Importing Metadata • Practical package size was 5.000 metadata records in an archive • Import costs a lot of time (5.000 records ~ 45 minutes – 60 minutes) • Importing metadata into terraCatalog generates GBs of redo-logs (200 MB per minute) • Formulate queries in CSW 2.0.2 • Challenge was to describe a query that both system understood (limited CSW implementation from Geonetwork 2.2) • Parameterize query for different result sets (e.g. search title for “zyx” 0 hits, search title for “gts” 136.511 hits)
Performance Tests - Results + (fulfilled), - (failed), o (partially)
Performance Tests - Results INSPIRE WMO + (fulfilled), - (failed), o (partially)
Currently it looks like that both systems are not capable to handle 140.000 metadata records according to the requirements of INSPIRE and WMO Performance Tests - Remarks • Geonetwork fails to meet the requirement if the result set contains more than 10.000 hits ( response time scales with size of the result set) • Geonetwork installation with 140.000 metadata records • First access of the GUI takes minutes! • Geonetwork 2.2 deployment of web app with around 3000 metadata records costs hours • terraCatalog fails to meet the requirement for combined searches • terraCatalog could not meet the response time requirement for geographical searches • terraCatalog errors if the search touches the equator • Fuzzy search for title, abstract, keywords … is a nice feature • terraCatalog up to 60 times faster as Geonetwork in simple queries • Other solutions like geowaySDI.NODE are although tested only with 25.000 records
Resources • WMO Wiki: http://www.wmo.int/pages/prog/www/WIS/wiswiki/tiki-index.php?page=geonetworkdoc • Geonetwork: http://geonetwork-opensource.org/ • BlueNet: http://anzlicmet.bluenet.utas.edu.au/ • con terra: http://www.conterra.de/