250 likes | 427 Views
GriPhyN - SDSC Research and Infrastructure. Reagan Moore San Diego Supercomputer Center. Topics. Research activities Advanced query interfaces - Amarnath Gupta Knowledge bases - Bertram Ludaescher Infrastructure development SRB replication - Michael Wan
E N D
GriPhyN -SDSC Research and Infrastructure Reagan Moore San Diego Supercomputer Center
Topics • Research activities • Advanced query interfaces - Amarnath Gupta • Knowledge bases - Bertram Ludaescher • Infrastructure development • SRB replication - Michael Wan • MCAT information catalog - Arcot Rajasekar • Grid Portals - Mary Thomas • WSDL web services - Arun Jagatheesan • Grids
LIGO Support Opportunities • Pattern recognition in template and chirp-transform data using database technology • Derived data product optimization through optimization of input parameters - controlled parameter sweeps • Utilization of SRB/MCAT for storage of virtual data products
SDSS Support Opportunities • Federation of sky survey services • Development of a dynamic cross-match service between SDSS and other sky surveys • WSDL based web interface for sky survey services • UDDI based service directory • Build topic map providing relationships between “Strasbourg sky survey” attributes • Correlate attributes through physical laws as well as derived observations
Integration of XSIL and XQuery • An XML query language designed for heterogeneous data sources • Authors: Don Chamberlin (IBM), Jonathan Robie (SoftwareAG), and Deniela Florescu (INRIA) • Quilt is built on previous XML query languages : • -- XPath, XQL, XML-QL, XMAS, Lorel, YATL • Become a standard query language for XML, called XQuery “List the titles of all books published by Addison Wesley after 1991, in alphabetic order.” FOR $b IN document("www.bn.com/bib.xml")//book [publisher = "Addison Wesley" AND @year > "1991"] RETURN $b/@year, $b/title SORTBY (title)
Extensible Scientific Interchange Language (XSIL) • A flexible, XML based, hierarchical, extensible, transport language for scientific data objects <?xml version="1.0"?> <!DOCTYPE XSIL SYSTEM "xsil.dtd"> <XSIL> <XSIL> <Array Name="hello" Type="double"> <Dim>10</Dim> <Stream Encoding="Text" Type="Local" Delimiter=","> 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 </Stream> </Array> </XSIL> <XSIL Type="Simple.Label" Name="Example"> <Param Name="Message">Hello Auntie Joan</Param> <Param Name="FontSize">96</Param> </XSIL> </XSIL>
Quilt Extensions • Added the concept of data types • Float, integer, and boolean versus string • Added operator overloading • “Sum” on type string concatenates • “Sum” on type integer adds • Added array operations • Get, set, element summation, array summation. Subsequence, concatenate
Data Grids Linking Collections • Logical • collection • Elements • attributes • Grid Container • Logical name • Container metadata • Element attributes • (Data model) • Elements Grid metadata catalog Export elements & attributes Available Transforms Mapping of logical containers to physical files Grid replica catalog Transforms On elements Derived data process metadata Import into existing or new logical collection Logical collection Derived data products Derived Data metadata
SRB Status • SRB Features • Demonstration of the ability to coordinate bulk metadata and bulk data loads • Aggregate files into a “container”, simultaneously write metadata into a file for bulk load into the MCAT information repository • Achieved file import rate of 250 files/second • Development in progress • Improved error statement management • mySRB.html web interface for collection support
MCAT Web Interface • Provide collection management • Create a collection • Define collection attributes • Ingest data / move / replicate • Browse • Query • Annotate • Comment • https://srb.npaci.edu/mySRB.html
Grid Portal Development • Integrate collection management of derived data products with Grid execution portal • Based on Grid Port and SRB • Funded by GriPhyN, NPACI, NASA IPG
GridPort + SRB Architecture • With SRB capabilities, file access is direct, uniform • Uses same authentication as portal and other Grid services • Single SRB account access allows for more flexible data management
Other Data Grids • NSF - National Virtual Observatory • DOE - Particle Physics Data Grid - Babar • NSF - United Kingdom data grid • NSF - Distributed Terascale Facility
Astronomy Sky Survey Data Grid 1. Portals and Workbenches 2.Knowledge & Resource Management Bulk Data Analysis Metadata View Data View Catalog Analysis 3. Standard APIs and Protocols Concept space 4.Grid Security Caching Replication Backup Scheduling Information Discovery Metadata delivery Data Discovery Data Delivery 5. Standard Metadata format, Data model, Wire format 6. Catalog Mediator Data mediator Catalog/Image Specific Access Compute Resources Catalogs Data Archives Derived Collections 7.
PPDG - Babar Support • Installed SRB at Stanford • Added Babar specific metadata attributes to MCAT catalog • Developed ability to support “soft links” between collections • Allows same file to appear in multiple collections • Release in SRB version 1.1.9 • UK data grid (SRB / Condor / Globus) • Rutherford - opportunity for international demonstration of Babar data replication
TeraGrid Wide Area Network StarLight International Optical Peering Point (see www.startap.net) Abilene Chicago DTF Backbone Indianapolis Urbana Los Angeles Starlight / NW Univ UIC San Diego I-WIRE Multiple Carrier Hubs Ill Inst of Tech ANL OC-48 (2.5 Gb/s, Abilene) Univ of Chicago Indianapolis (Abilene NOC) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber) NCSA/UIUC • Solid lines in place and/or available by October 2001 • Dashed I-WIRE lines planned for summer 2002
PACI 13.6 TF Linux TeraGrid 574p IA-32 Chiba City 32 256p HP X-Class 32 Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk 32 32 Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk 128p Origin 24 32 128p HP V2500 32 HR Display & VR Facilities 24 8 8 5 5 92p IA-32 HPSS 24 HPSS OC-12 ESnet HSCC MREN/Abilene Starlight Extreme Black Diamond 4 Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) OC-48 Calren OC-48 OC-12 NTON GbE OC-12 ATM Juniper M160 NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk Juniper M40 Juniper M40 OC-12 vBNS Abilene Calren ESnet OC-12 vBNS Abilene MREN OC-12 OC-12 2 2 OC-12 OC-3 OC-3 Myrinet Clos Spine 8 4 UniTree 8 HPSS 2 = 32x 1GbE Sun Starcat Myrinet Clos Spine 4 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 16 14 = 64x Myrinet 4 = 32x Myrinet 1500p Origin Sun E10K = 32x FibreChannel = 8x FibreChannel 10 GbE 32 quad-processor McKinley Servers (128p @ 4GF, 8GB memory/server) 32 quad-processor McKinley Servers (128p @ 4GF, 12GB memory/server) Fibre Channel Switch 16 quad-processor McKinley Servers (64p @ 4GF, 8GB memory/server) IA-32 nodes Cisco 6509 Catalyst Switch/Router
Further Information http://www.npaci.edu/DICE
C, C++, Linux I/O Unix Shell SRB Databases DB2, Oracle, Postgres Archives HPSS, ADSM, UniTree, DMF File Systems Unix, NT, Mac OSX SDSC Storage Resource Broker & Meta-data Catalog Application Resource, User Java, NT Browsers Prolog Predicate Third-party copy Web User Defined Remote Proxies MCAT HRM Dublin Core DataCutter Application Meta-data
Replication Attributes • DATA_NAME • Global SRB data object name • DATA_REPL_ENUM • Replica copy number • SIZE • Size of data in bytes • DATA_TYP_NAME • Data type (primarily specification of the data format) • DATA_CLASS_NAME • Logical classification of the data (description of the type). • DATA_CLASS_TYPE • Classification type • ACCESS_CONSTRAINT • Access restrictions on data DATA_COMMENTS
Replication Attributes (2) • DATA_COMMENTS_TIMESTAMP • Time and date stamp for when comments were made on the data object • REPL_TIMESTAMP • Time and date stamp when the owner modified the data object. • PATH_NAME • Physical path name of the data object. • DATA_CREATE_TIMESTAMP • Time and date stamp for when the data was created • DATA_IS_DELETED • A flag can be turned on that indicates a data object has been deleted, while retaining the data set on storage. • DATA_OWNER • Data object creator name. • DATA_OWNER_DOMAIN • Domain/ group of the data object creator.
Quilt Extension (1) – Data Type • Original Quilt: No difference between dt1.xml and dt2.xml dt1.xml dt2.xml <bills> <bill name="S.10"> <id type="string">21</id> … … <sponsor_id type="string">122 </sponsor_id> </bill> <bill name="S.100"> <id type="string">123</id> … … <sponsor_id type="string">203 </sponsor_id> </bill> … … </bills> <bills> <bill name="S.10"> <id type=“float">21</id> … … <sponsor_id type=“float">122 </sponsor_id> </bill> <bill name="S.100"> <id type=“float">123</id> … … <sponsor_id type=“float">203 </sponsor_id> </bill> … … </bills> • After we add data type …
Quilt Extension (2) – Operator Overloading Query 1 : sum of id and sponsor_id ( type = string ) <results> FOR $bill in document(“dt1.xml")//bill RETURN <bill $bill/@name> $bill//id, $bill//sponsor_id, <sum>$bill//id/text() + $bill//sponsor_id/text()</sum> </bill> </results> <results> <bill name="S.10"> <id type="string"> 21 </id> <sponsor_id type="string"> 122 </sponsor_id> <sum> 21122 </sum> </bill> <bill name="S.100"> <id type="string"> 123 </id> <sponsor_id type="string"> 203 </sponsor_id> <sum> 123203 </sum> </bill> … …
Quilt Extension (2) – Operator Overloading Query 2 : sum of id and sponsor_id ( type = integer ) <results> FOR $bill in document(“dt2.xml")//bill RETURN <bill $bill/@name> $bill//id, $bill//sponsor_id, <sum>$bill//id/text() + $bill//sponsor_id/text()</sum> </bill> </results> <results> <bill name="S.10"> <id type="integer"> 21 </id> <sponsor_id type="integer"> 122 </sponsor_id> <sum> 143.0 </sum> </bill> <bill name="S.100"> <id type="integer"> 123 </id> <sponsor_id type="integer"> 203 </sponsor_id> <sum> 326.0 </sum> </bill> … …
Quilt Extension (3) – Array Operation Value ValueArray ValueIntegerArray ValueFloatArray ValueStringArray ValueBoolArray • Value : Interface for Kweelt base type • ValueArray : Extend Value. Implement Compare and array-specific operation • Accessor – getter, setter • Element summation • Array summation • Subsequence • Zip, Unscroll, concatenation, etc Demo : http://pamina2.sdsc.edu/cgi-bin/kweelt/demo.cgi