1 / 15

Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director nesc.ac.uk

Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director www.nesc.ac.uk 3 rd October 2003. Discovery is a wonderful thing . Web Hits - Domain. Theory Models & Simulations → Shared Data. Experiment & Advanced Data Collection → Shared Data.

nusa
Download Presentation

Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director nesc.ac.uk

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director www.nesc.ac.uk 3rd October 2003

  2. Discovery is a wonderful thing 

  3. Web Hits - Domain

  4. TheoryModels & Simulations→ Shared Data Experiment &Advanced Data Collection→Shared Data Requires Much Engineering, Much Innovation Computing ScienceSystems, Notations &Formal Foundation → Process & Trust Changes Culture, New Mores, New Behaviours Our job: Make the Party a Success every time Multi-national, Multi-discipline, Computer-enabled Consortia, Cultures & Societies

  5. Integration is our Focus • Supporting Collaboration • Bring together disciplines • Bring together people engaged in shared challenge • Inject initial energy • Invent methods that work • Supporting Collaborative Research • Integrate compute, storage and communications • Deliver and sustain integrated software stack • Operate dependable infrastructure service • Integrate multiple data sources • Integrate data and computation • Integrate experiment with simulation • Integrate visualisation and analysis • High-level tools and automation essential • Fundamental research as a foundation

  6. It’s Easy to ForgetHow Different 2003 is From 1993 • Enormous quantities of data: Petabytes • For an increasing number of communities • Gating step is not collection but analysis • Ubiquitous Internet: >100 million hosts • Collaboration & resource sharing the norm • Security and Trust are crucial issues • Ultra-high-speed networks: >10 Gb/s • Global optical networks • Bottlenecks: last kilometre & firewalls • Huge quantities of computing: >100 Top/s • Moore’s law gives us all supercomputers • Ubiquitous computing • (Moore’s law)2 everywhere • Instruments, detectors, sensors, scanners, … Derived from Ian Foster’s slide at ssdbM July 03

  7. RAM time to move 15 minutes 1Gb WAN move time 10 hours ($1000) Disk Cost 7 disks = $5000 (SCSI) Disk Power 100 Watts Disk Weight 5.6 Kg Disk Footprint Inside machine RAM time to move 2 months 1Gb WAN move time 14 months ($1 million) Disk Cost 6800 Disks + 490 units + 32 racks = $7 million Disk Power 100 Kilowatts Disk Weight 33 Tonnes Disk Footprint 60 m2 Tera → Peta Bytes Now make it secure & reliable! May 2003 Approximately Correct See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24

  8. DynamicallyMove computation to the data • Assumption: code size << data size • Develop the database philosophy for this? • Queries are dynamically re-organised & bound • Develop the storage architecture for this? • Compute closer to disk? • System on a Chip using free space in the on-disk controller • Data Cutter a step in this direction • Develop the sensor & simulation architectures for this? • Safe hosting of arbitrary computation • Proof-carrying code for data and compute intensive tasks + robust hosting environments • Provision combined storage & compute resources • Decomposition of applications • To ship behaviour-bounded sub-computations to data • Co-scheduling & co-optimisation • Data & Code (movement), Code execution • Recovery and compensation Dave Patterson Seattle SIGMOD 98

  9. Job Submission Brokering Workflow Structured Data Integration Registry Banking Authorisation Data Transport Resource Usage Transformation Structured Data Access Structured Data Relational XML Semi-structured - Infrastructure Architecture Data Intensive X Scientists Data Intensive Applications for Science X Simulation, Analysis & Integration Technology for Science X Generic Virtual Data Access and Integration Layer OGSA OGSI: Interface to Grid Infrastructure Compute, Data & Storage Resources Distributed Virtual Integration Architecture

  10. 1a. Request to Registry for sources of data about “x” SOAP/HTTP service creation API interactions Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access to database Factory Client 2c. Factory returns handle of GDS to client 2b. Factory creates GridDataService to manage access 3a. Client queries GDS with XPath, SQL, etc XML / Relational database Grid Data Service 3c. Results of query returned to client as XML 3b. GDS interacts with database Data Access & Integration Services

  11. SOAP/HTTP service creation API interactions ProblemSolving Environment SemanticMeta data “scientific” Application coding scientific insights Application Code Future DAI Services 1a. Request to Registry for sources of data about “x” & Data “y” Registry 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration from resources Sx and Sy Data Access & Integrationmaster 2c. Factory returns handle of GDS to client 3b. Client 2b. Factory creates tells GridDataServices network analyst Client 3a. Client submits sequence of scripts each has a set of queries GDTS to GDS with XPath, SQL, etc 1 XML Analyst GDS GDTS database GDS 2 S x GDS S 3c. Sequences of result sets returned to y Relational analyst as formatted binary described in GDTS GDS GDS 2 3 a standard XML notation 1 database GDS GDTS

  12. A New World • What Architecture will Enable Data & Computation Integration? • Common Conceptual Models • Common Planning & Optimisation • Common Enactment of Workflows • Common Debugging • … • What Fundamental CS is needed? • Trustworthy code & Trustworthy evaluators • Decomposition and Recomposition of Applications • … • Is there an evolutionary path?

  13. Take Home Message • Information Grids • Support for collaboration • Support for computation and data grids • Structured data fundamental • Relations, XML, semi-structured, files, … • Integrated strategies & technologies needed • OGSA-DAI is here now • A first step • Try it • Tell us what is needed to make it better • Join in making better DAI services & standards

  14. Globus Alliance HPC(x) Directors’ Forum Helped build a community Engineering Task Force Grid Support Centre Architecture Task Force UK Adoption of OGSA OGSA Grid Market Workflow Management Database Task Force OGSA-DAI GGF DAIS-WG GridNet e-Storm NeSC in the UK Nationale-Science Centre Edinburgh Glasgow Newcastle Belfast Manchester Daresbury Lab Cambridge Oxford Hinxton RAL Cardiff London Southampton

  15. www.nesc.ac.uk

More Related