180 likes | 273 Views
Introduction to OGSA-DAI. Neil Chue Hong 15 th February 2006 GGF16, Athens. Data Services: challenges. Scale Many sites, large collections, many uses Longevity Research requirements outlive technical decisions Diversity No “one size fits all” solutions will work
E N D
Introduction to OGSA-DAI Neil Chue Hong15th February 2006 GGF16, Athens
Data Services: challenges • Scale • Many sites, large collections, many uses • Longevity • Research requirements outlive technical decisions • Diversity • No “one size fits all” solutions will work • Primary Data, Data Products, Meta Data, Administrative data, … • Many Data Resources • Independently owned & managed • Geographically distributed • and I haven’t even mentioned security yet!
Use Cases for Data Services • Data Filtering: • Single source producing large amounts of data distributed to many sites downstream • Data Discovery: • many sources, many query entry points in a linked system • Data Translation: • source to sink, conversion of data model / structure • Data Federation: • many sources, linked to provide view as a single source • Data Replication • full or partial copies to improve throughput • Data Integration (model aggregation) • e.g. integration of time variant data, streams, files • Data Integration (knowledge expansion) • forming links between databases to increase knowledge
Requirements on Data Services? • Common Data Model e.g. RowSet • Common Query Language(s) e.g. XQuery, SQL • Standard access to • data resource schema information • physical data resource information for optimisation purposes • data resource descriptive information for discovery / integration • Single, seamless security model • Dynamic publication and discovery • Multiple, efficient delivery methods • Move computation towards data • Data aggregation functionality • Replication information
OGSA-DAI In One Slide • An engineered extensible framework for data access and integration. • Expose heterogeneous data resources to a grid through web services. • Interact with data resources: • Queries and updates. • Data transformation / compression • Data delivery. • Customise for your project using • Additional Activities • Client Toolkit APIs • Data Resource handlers • A base for higher-level services • federation, mining, visualisation,…
OGSA-DAI Philosophy • We provide the basic, general functionality • e.g. querying relational databases, delivery mechanisms, schema extractors • You add the specialist functionality • e.g. map overlays • Several well-defined extension points • client toolkit • activity plugins • data resource accessor model
Application Client Toolkit OGSA-DAI service Engine XPath readFile SQLQuery GZip XSLT GridFTP Activities JDBC XMLDB File Data Resources DB2 SQL Server MySQL XIndice SWISS PROT Data- bases
SQL SQL SQL SQL JDBC JDBC JDBC JDBC OGSA-DAI service Engine SQLQuery SQLQuery Multiple SQL GDS JDBC MySQL
3,4 reduce op_call (Blast) exchange hash_join (proteinId) reduce exchange reduce 1 2 table_scan (protein) table_scan termID=S92 (proteinTerm) Distributed Query Processing • Higher level services building on OGSA-DAI • Queries mapped to algebraic expressions for evaluation • Parallelism represented by partitioning queries • Use exchange operators
JDBC NGS Authentication Oracle Census ODS 1 SQL/XML SO-OGC OGC Portlet ODS 2 GIS Oracle SO-OGC ODS 3 Application data Map Retrieval: Integration • Using security and extensibility (overlay)
MDS/GridFTP/GSI Integration • Can publish any OGSA-DAI resource property to a local MDS Index Service • e.g. databaseSchema, activityTypes • information published is on a per-resource basis, and can differ for each resource • Can transfer results via GridFTP rather than via SOAP • still working on tuning options • Can use X509 certificates to secure services • but still a coarse grained security by default
Future plans: overview • A new version of the OGSA-DAI Engine • better support for concurrency, sessions, monitoring and notification • Implementing new DAIS specifications • Key things that we will be addressing: • Performance (particularly format representation and transport) • Security Model which can be applied across platforms • Transactions provision • More data integration facilities • Integration with other components • registries (e.g. GRIMOIRES) • workflow editors (e.g. Taverna) • Working with new projects • e.g. CancerGrid, iSpider, GEODE
ResultSet to RowSet conversion Future plans: Performance • WebRowSet is not efficient • aim to use ResultSet and CSV instead where possible • SOAP is not efficient • aim to use SOAP w/Attachments, MTOM WebRowSet is larger CSV scales better for output Conversion and validation takes the time work in progress Jan06
From contribution to core • One of a group of projects moving to GlobDev project (more later) • Hope to use this as a way of encouraging collaborations and contributions • Different levels of contributions • Based on OGSA-DAI? • Works with OGSA-DAI? • Part of OGSA-DAI?
Contributing to OGSA-DAI • Additional functionality: • Provide activities which implement specific functionality • Provide extra client functionality • Provide different security mechanisms • Provide higher level components and applications
Further information • The OGSA-DAI Project Site: • http://www.ogsadai.org.uk • The DAIS-WG site: • http://forge.gridforum.org/projects/dais-wg/ • OGSA-DAI Users Mailing list • users@ogsadai.org.uk • Formal support for OGSA-DAI releases • http://bugzilla.globus.org (OGSA-DAI) • OGSA-DAI training courses (live and online)