200 likes | 309 Views
Metadata Services on the GRID. Nuno Santos ACAT’05 May 25 th , 2005. Contents. Metadata on the GRID ARDA-gLite Metadata Interface The ARDA Implementation Performance study: SOAP vs TCP Streaming. Metadata on the GRID. Metadata is data about data Metadata on the GRID
E N D
Metadata Services on the GRID Nuno Santos ACAT’05 May 25th, 2005
Contents • Metadata on the GRID • ARDA-gLite Metadata Interface • The ARDA Implementation • Performance study: SOAP vs TCP Streaming
Metadata on the GRID • Metadata is data about data • Metadata on the GRID • Mainly information about files • Other information necessary for running jobs • Usually living on DBs • Need simple interface for Metadata access • Advantages • Easier to use by clients - no SQL, only metadata concepts • Common interface - clients don’t have to reinvent the wheel • Must be integrated in the File Catalogue • Also suitable for storing information about other resources
ARDA-gLite Metadata Interface • ARDA proposed an interface for Metadata access on the GRID • Designed jointly with the gLite/EGEE team • Incorporates feedback from GridPP • Endorsed by the EGEE standards committee (PTF) • Being implemented in gLite File Catalog (FiReMan) • Interface concepts • Metadata - Key-value pairs • Entry - Entities to which metadata is attached • Attribute – Holds information about an entry • Schema – A collection of attributes • Type – The type (int, float, string,…) • Name/Key – The name of the attribute • Value - Value of an entry's attribute • Entries are associated with schemas • Think of schemas as tables, attributes as columns, entries as rows
Interface Operations • Schema management void createSchema(String schemaName, Attribute[] attributes) void dropSchema(String schemaName) void removeSchemaAttributes(String schemaName, String[] attributeNames) void addSchemaAttributes(String schemaName, Attribute[] attributes) • Entry management void createEntry(MDEntry[] entries, String[] schemas) void removeEntry(String query) int setAttributes(String query, Attribute[] attributes) Attribute[] listAttributes(String entry)
Interface Operations • Searching and retrieving entries MDResult query(MDQuery query) MDResult nextQuery(String token, MDQuery query) void endQuery(String token) • Datatypes Allows either stateful or stateless server implementations Attribute { String schema String name String type String value } MDEntry { String entry Attribute[] attributes } MDQuery { String query String queryType } MDResult { MDEntry[] entries String token Boolean done }
ARDA Prototype • Validate proposed interface • Architecture: • Metadata organized in a hierarchy • Schemas can contain sub-schemas • Can inherit attributes • Analogy to file system: • Schema Directory; Entry File • Stability with large responses • Send large responses in chunks • Otherwise preparing large responses could crash server • Stateful server • DB → Server – Data streamed using DB cursors • Server → Client – Response sent in chunks
ARDA Implementation • Backends • Currently: Oracle, PostgreSQL, SQLite • Two frontends • TCP Streaming • Chosen for performance • SOAP • Formal requirement of EGEE • Compare SOAP with TCP Streaming • Also implemented as standalone Python library • Data stored on filesystem
TCP Streaming Frontend • Text based protocol (like SMTP, POP3,…) • Data streamed to client in single connection • Implementation • Server – C++, multiprocess • Clients – C++, Java, Python, Perl, Ruby Client:listattr entry Server:0 entry value1 value2 … <EOT>
SOAPFrontend • Most operations in interface implemented as simple SOAP calls • query() - based oniterators • Initial request – create session • Open cursor on DB • Return initial chunk of data and session token • Subsequent requests • Client calls nextQuery() using session token • Termination – session closed when: • End of data • Client calls endQuery() • Client timeout • Implementations • Server – gSOAP (C++). • Clients – Tested WSDL with gSOAP, ZSI (Python),AXIS (Java)
Current Uses of the ARDA prototype • Evaluated by LHCb-bookkeeping • Migrated bookkeeping metadata to ARDA prototype • 20M entries, 15 GB • Feedback valuable in improving interface and fixing bugs • Interface found to be complete • ARDA prototype showing good scalability • Ganga (LHCb, ATLAS) • User analysis job management system • Stores job status on ARDA prototype • Highly dynamic metadata
Performance Study • SOAP increasingly used as standard protocol for GRID computing • Promising web services standard - Interoperability • Some potential weaknesses • XML encoding increases message size (4x to 10x typical) • XML processing is compute and memory intensive • How significant are these weaknesses? What is the cost of using SOAP? • ARDA metadata implementation ideal for comparing SOAP with a traditional RCP protocol
Benchmark Description • Protocols • TCP-S – TCP Streaming • SOAP – Clients with gSoap (C++), Axis (Java) and ZSI (Python) • Operations • ping – A null RPC • add – Adds an entry • get – Gets all attributes of an entry • get (bulk) – Gets all attributes of several entries in a single operation • Entries • 60 attributes (ints, floats and strings) • 700 bytes on average • HTTP Keepalive/Persistant connections • HTTP Keepalive increase HTTP performance. Should improve SOAP performance. • gSOAP supports Keepalive. Axis and ZSI don’t. • TCP-S uses persistent TCP connections to compare with HTTP Keepalive
SOAP Data Overhead • Measure size overhead of XML encoding • Ping • 1000 requests • Minimal payload – less than 5 bytes per request • SOAP overhead around 8 times • Get attributes in bulk • Retrieve 1000 entries • Around 800KB of application data • Streaming in TCP • Iterators with SOAP – 4KB average SOAP packet payload • With keepalive • SOAP overhead around 2.5 times Total data transferred (in KB)
SOAP Toolkits performance • Test protocol performance • No work done on the backend • Switched 100Mbits LAN • Language comparison • TCP-S with similar performance in all languages • SOAP performance varies strongly with toolkit • Protocols comparison • Keepalive improves performance significantly • On Java and Python, SOAP is several times slower than TCP-S 1000 pings
Single client results (LAN) • Compare performance of different operations • C++ clients (gSOAP) • When backend must do work, differences between gSOAP and TCP-S are small • Bulk operations very important for performance • getBulk 4x faster than get 1000 pings/1000 Entries
Single client results (WAN) • Client CERN, server Taiwan • ≈300 ms latency • Results dominated by latency • Execution time at server irrelevant • Large performance boost from latency hiding techniques: • keepalive – fewer TCP handshakes • bulk operations – fewer client/server interactions 1000 pings/1000 Entries
Scalability with Multiple Clients - Pings • Measure scalability of protocols • Switched 100Mbits LAN • TCP-S 3x faster than gSoap (with keepalive) • Poor performance without keepalive • Around 1.000 ops/sec (both gSOAP and TCP-S) 1000 pings
Scalability with Multiple Clients - getAttr • Measure scalability with realistic payload • Switched 100Mbits LAN • All tests with keepalive • Smaller difference between gSOAP and TCP-S • TCP-S 2x faster (1000 vs 500 entries/sec) • Poor performance of non-bulk operations • 100 entries/sec 1000 entries
Conclusions • A common Metadata Interface was developed by ARDA and gLite • Endorsed by the EGEE standards committee • Interface validated by ARDA prototype • Prototype in use by LHCb (bookkeeping, Ganga) and ATLAS (Ganga) • SOAP performance studied using ARDA implementation • Toolkit performance varies widely • Large SOAP overhead (over 100%)