290 likes | 448 Views
OGSA-DAI Status and Benchmarks. All Hands Meeting 2005 Nottingham, 22 September 2005. Overview. The all new OGSA-DAI overview Benchmarking and profiling work Project collaboration Future plans. ESNW, Manchester. OGSA-DAI team. NeSC, Edinburgh. EPCC Team, Edinburgh. NEReSC, Newcastle.
E N D
OGSA-DAIStatus and Benchmarks All Hands Meeting 2005 Nottingham, 22 September 2005
Overview • The all new OGSA-DAI overview • Benchmarking and profiling work • Project collaboration • Future plans AHM2005
ESNW, Manchester OGSA-DAI team NeSC, Edinburgh EPCC Team, Edinburgh NEReSC, Newcastle IBM Dissemination Team IBM Development Team, Hursley AHM2005
OGSA-DAI In One Slide • An extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: • Queries and updates. • Data transformation / compression • Data delivery. • Customise for your project using • Additional Activities • Client Toolkit APIs • Data Resource handlers • A base for higher-level services • federation, mining, visualisation,… AHM2005
SQL SQL SQL SQL JDBC JDBC JDBC JDBC Extensibility Example OGSA-DAI service Engine SQLQuery SQLQuery Multiple SQL GDS JDBC MySQL AHM2005
Timeline 2003 2004 2005 OGSA-DAI WSRF 1.0 OGSI Release 6 Release 1 Release 3.1 OGSA-DAI WS-I 1.0/ OGSA-DAI WS-I 1.1 (OMII) Release 1 interim Release 4 Release 2 Release 2 interim Release 5 Release 3 AHM2005
SOAP GDS GDS GDS Out with the old… Client Client Client Toolkit API DAISGR Server GDSF GDSF GDSF Relational XML Files Data AHM2005
Client Generic Client Toolkit API WSRF WS-I SOAP Data Service Data Service WS-I WSRF DAI Core DSR DSR DSR Relational XML Files … in with the new! Client Server Data AHM2005
Changes in moving to WSRF/WS-I • Registry component (DAISGR) no longer supported • Hope to leverage of third party registration services • GRIMOIRES (http://www.omii.ac.uk/mp/mp_grimoires.htm) • Others … • GDS/GDSF roles combined • Use data services • Currently static services but • Reconfigurable services • Improvements to the GDS • Data resource abstraction decoupled from the service • Renaming (consistent naming across platform versions) • Ability to enforce control flow constraints (ordering activities) • Refactored exception framework • Temporary set-backs (we promise we’ll fix them) • No security model • No concurrency • Previously used GDSs for concurrency • Support now moving to the engine AHM2005
Benchmarking/Profiling • Establish benchmark suite to: • Measure performance gains/losses between releases • Reveal implementation issues • Allows focused improvements • Establish best practice • Summer intern (Heather Kelly) produced results • Profiling allows us to identify particular areas which are causing poor performance in the benchmarks • Summer intern (Radoslaw Ostrowski) extended Netlogger and did some profiling • Most of the results are for OGSA-DAI R6 • one slide showing what is happening in R7 AHM2005
Tomcat 4.1.29 GT 3.2.1 OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM 10MBit network SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM Configuration • Measure the time to: • Send SQL query to server • Return nRows • Sum the values in one of the columns • Do this 30 times • Calculate mean and standard deviation • Repeat the process having increased nRows by stepsize • Try various different databases • Notes: • Time to establish connection in JDBC runs not included • JDBC does not return results in WebRowSet format • Server is already running • Data source little blackbook • Test database included in distributions AHM2005
Some benchmarks • Relational query • StreamServlet requires two communications • could improve this • FTP not iterating over result set • JDBC scales much better than SOAP • ResultSet implementations • Forwards-backwards implementation builds DOM tree; larger memory footprint AHM2005
Database comparison (OGSA-Dai WSRF 1.0, nRows = 10000, number of runs = 30, stepsize = 500) AHM2005
Platform comparison(MySQL database, nRows = 10000, number of runs = 30, stepsize = 500) AHM2005
Profiling: better RowSet conversion ResultSet to RowSet conversion AHM2005
R6->R7: removal of RowSet AHM2005
Challenges • Intermediate representation • between multiple models (relational, XML,…) • XML WebRowSet is flexible (c.f. GridMiner) but expansive • DFDL and GridFTP/parallel HTTP? • Query definition • translation of queries • Data transport and workflow • workflow is typically compute driven • Move computation to data • mobile code activities? • data services hosted on DBMS? AHM2005
caBIG “Object-Oriented” view of data • Data types are well-defined and registered in a repository • Standardized metadata facilitates discovery • custom query language implemented as an activity AHM2005
IU UA Huntsville Okla Univ Millersville UCAR Unidata NCSA Illinois Each satellite replicates its contents to the master catalog Master catalog LEAD AHM2005
Users Group and DIALOGUE Workshops • 3rd Users Group meeting • June 1st • http://www.ogsadai.org.uk/docs/UG3/ • DIALOGUE Workshops • Data Integration Applications: Linking Organisations to Gain Understanding and Experience • Columbus, Edinburgh, Vienna, Indiana • Bringing together Data Integration middleware and application providers with users • http://www.datagrids.org AHM2005
Future plans • A new version of the OGSA-DAI Engine • should look mostly the same externally • better support for concurrency, sessions and monitoring • see Architecture paper/talk presented on Monday • Implementing new versions of specifications • DAIS Specifications • Key things that we will be addressing after Release 7: • Performance • A Security Model which can be applied across platforms • Full Transactions provision, including implementation of compensatory activities, distributed transactions • More data integration facilities • Better abstraction over DBMS variation AHM2005
Conclusions • OGSA-DAI has had to undergo significant refactoring to keep stakeholders happy • Refactoring has allowed us to create an extensible framework which can be used for many data related tasks • We need to identify the components and improvements which will be useful to users • There is obviously room for improvement on performance, and we are working on it AHM2005
Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • General discussion on grid DAI matters • Formal support for OGSA-DAI releases • http://www.ogsadai.org.uk/support • support@ogsadai.org.uk • OGSA-DAI training courses AHM2005
Core features of OGSA-DAI – I • A framework for building applications • Supports data access, insert and update • Relational: MySQL, Oracle, DB2, SQL Server, Postgres • XML: Xindice, eXist • Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… • Supports data delivery • SOAP over HTTP • FTP; GridFTP • E-mail • Inter-service • Supports data transformation • XSLT • ZIP; GZIP • Supports security • X.509 certificate based security AHM2005
Core features of OGSA-DAI – II • A framework for building data clients • Client toolkit library for application developers • A framework for developing functionality • Extend existing activities, or implement your own • Mix and match activities to provide functionality you need • Highly-extensible • Customise our out-of-the-box product • Provide your own services, client-side support and data-related functionality • Comprehensive documentation and tutorials • Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 / OMII_2 using Java 1.4 AHM2005
Efficient client-server communication Minimise where possible One request specifies multiple operations No unnecessary data movement Move computation to the data Utilise third-party delivery Apply transforms (e.g., compression) Build on existing standards Fill-in gaps where necessary OGSA-DAI Design Principles – I AHM2005
OGSA-DAI Design Principles – II • Do not hide underlying data model • Users must know where to target queries • Data virtualisation is hard • Extensible architecture • Modular and customisable • e.g., to accommodate stronger security • Extensible activity framework • Cannot anticipate all desired functionality • Activity = unit of functionality • Allow users to plug-in their own AHM2005
Data Integration challenges • Metadata extraction • define a common model for e.g. database schema? • Intermediate representation • between multiple models (relational, XML,…) • XML WebRowSet is flexible (c.f. GridMiner) but expansive • DFDL and GridFTP/parallel HTTP? • Query definition • translation of queries • Data transport and workflow • workflow is typically compute driven • Move computation to data • mobile code activities? • data services hosted on DBMS? AHM2005
Contributing to OGSA-DAI • Additional functionality: • Provide activities which implement specific functionality • Provide extra client functionality • Provide different security mechanisms • Provide higher level components and applications • Different levels of contributions • Based on OGSA-DAI? • Works with OGSA-DAI? • Part of OGSA-DAI? AHM2005