540 likes | 560 Views
Explore the use of grids and peer-to-peer networks in the field of e-Science. Study their performance, simulate data access/storage, and model application-level systems involving multiple devices. Discover trends in increasing resources, applications, and new technologies.
E N D
SPECTS San Diego July 17 2002 Grids and Peer-to-Peer Networks for e-Science PTLIU Laboratory for Community Grids Geoffrey Fox and Community Grid Staff and Students Computer Science, Informatics, Physics Indiana University, Bloomington IN 47404http://grids.ucs.indiana.edu/ptliupages gcf@indiana.edu uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Summary • Grid: Global Computing Infrastructure with a myriad of heterogeneous devices connected by diverse networks • Measure and study their performance • Related to but different from classical parallel computing performance studies • Web services: New object models providing universality in a service model of electronic capability • Simulate, data access/storage etc. • Nodes of application level systems one can model • Systems involve multiple devices connected together – synchronization of these is performance driver • Communities or virtual organizations are e-Science collective systems uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Trends of Importance • Resources of increasing performance or functionality • Computers (ASCI, Earth Simulator to TeraGrid), storage, sensors, networks, PDA’s • More and more data distributed around the world • Applications of increasing sophistication • Size, multi-scales, multi-disciplines • Compose simulations from different disciplines • New algorithms and mathematical techniques • Traditional Computer science • Compilers, Parallelism, Objects, Components • Grid and Internet Concepts and Technologies • Enabling new applications • XML, Web Services, Portals, Collaboration uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Projected Top 500 Until Year 2009 • First, Tenth, 100th, 500th, SUM of all 500 Projected in Time Earth Simulator from Japan http://geofem.tokyo.rist.or.jp/ uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
OC-12 vBNS Abilene MREN OC-12 OC-3 = 32x 1GbE 32 quad-processor McKinley Servers (128p @ 4GF, 8GB memory/server) PACI 13.6 TF Linux TeraGrid 574p IA-32 Chiba City 32 256p HP X-Class 32 Argonne 64 Nodes 1 TF 0.25 TB Memory 25 TB disk 32 32 Caltech 32 Nodes 0.5 TF 0.4 TB Memory 86 TB disk 128p Origin 24 32 128p HP V2500 32 HR Display & VR Facilities 24 8 8 5 5 92p IA-32 HPSS 24 HPSS OC-12 ESnet HSCC MREN/Abilene Starlight Extreme Black Diamond 4 Chicago & LA DTF Core Switch/Routers Cisco 65xx Catalyst Switch (256 Gb/s Crossbar) OC-48 Calren OC-48 OC-12 NTON GbE OC-12 ATM Juniper M160 NCSA 500 Nodes 8 TF, 4 TB Memory 240 TB disk SDSC 256 Nodes 4.1 TF, 2 TB Memory 225 TB disk Juniper M40 Juniper M40 OC-12 vBNS Abilene Calren ESnet OC-12 2 2 OC-12 OC-3 Myrinet Clos Spine 8 4 UniTree 8 HPSS 2 Sun Starcat Myrinet Clos Spine 4 1024p IA-32 320p IA-64 1176p IBM SP Blue Horizon 16 14 = 64x Myrinet 4 = 32x Myrinet 1500p Origin A Grid of a 1000 distributed systemse-Science links to all sensors and all desktops, all university systems, and PDA’s of all researchers Sun E10K = 32x FibreChannel = 8x FibreChannel 10 GbE 32 quad-processor McKinley Servers (128p @ 4GF, 12GB memory/server) Fibre Channel Switch 16 quad-processor McKinley Servers (64p @ 4GF, 8GB memory/server) IA-32 nodes Cisco 6509 Catalyst Switch/Router uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Integration of PDA’s and supercomputers (etc.) implies very heterogeneous systems spanning traditional performance fields Small Devices Increasing in Importance CM5 • There is growing interest in wireless portable displays in the confluence of cell phone and personal digital assistant markets • By 2005, 60 million internet ready cell phones sold each year • 65% of all Broadband Internet accesses via non desktop appliances uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
The HPCC Thrust has run its course? • The 1990 HPCC 10 year initiative was largely aimed at parallel computing enabling large scale simulations for a broad range of computational science and engineering problems • It was in many ways a success and we have methods and machines that can (begin to) tackle most 3D simulations • ASCI simulations particularly impressive • DoE still putting substantial resources into basic software and algorithms from adaptive meshes to PDE solver libraries • Machines are still increasing in performance exponentially and should achieve petaflops in next 7-10 years • Not obvious that there will be major changes in parallel computer architecture and methodology uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
e-Science • e-Science implies integration of data and researchers around the model and builds on • Parallel Computers for Simulation • Sensors (satellites or ground based) for data • Databasesfor knowledge • Networks to link people, computers and data Data Assimilation Information Simulation InformationTechnology Model Datamining Ideas Reasoning ComputationalScience uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Database Database Classic Grid Architecture Resources Content Access Composition Middle TierBrokers Service Providers Netsolve Security Collaboration Computing Middle Tier becomes Web Services Clients Users and Devices uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
One e-Science Example Astronomy is Facing a Major Data Avalanche Astronomy is Facing a Major Data Avalanche: Multi-Terabyte Sky Surveys and Archives (Soon: Multi-Petabyte), Billions of Detected Sources, Hundreds of Measured Attributes per Source … Total area of 3m+ telescopes in the world in m2, total number of CCD pixels in Megapix, as a function of time. Growth over 25 years is a factor of 30 in glass, 3000 in pixels. uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
The Changing Style of Observational Astronomy Astronomy at the desktop not at the telescope uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Source Catalogs,Image Data Specialized Data: Spectroscopy, Time Series, Polarization Information Archives: Derived & legacy data: NED,Simbad,ADS, etc Query Tools Analysis/Discovery Tools: Visualization, Statistics Standards What is the NVO? - Content uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Service Providers Query engines, Compute engines Data Providers Surveys, observatories, archives, SW repositories Information Providers e.g. ADS, NED, ... What is the NVO? - Components uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Rival Estimate MainlyDigital Video Cohen’s Grid/P2P Use of Internet I ROBERT B. COHEN, PH.D. COHEN COMMUNICATIONS GROUP bcohen@bway.net 212-986-7720 Global Grid Forum Toronto Feb 18 2002 uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
S2S Server to Server Digital Video“on demand” Grid/P2P Use of Internet II } P2P Grid uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Use of Object Technologies I • The claimed commercial success in using Object and component technology has not yet been a clear success in HPCC and indeed in modeling & simulation • Object technologies do not naturally support either high performance or parallelism • C++ can be high performance but Java (as a language) is not uniformly so (it is improving) • We suggest that Web Services could change this • Fortran (including Fortran90) will continue to decline in importance and interest – the community should prefer not to use it • It’s use will not attract the best students • Not essential to write modules in object oriented language • It is essential to package modules in object framework uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Use of Object Technologies II • There is emerging HPCC component architecture allowing production of more modern libraries (integration Infrastructure) • DoE has very large CCA – Common Component Architecture – effort • Package software (“system and applications”) as distributed objects – not as traditional libraries • CORBA HLA Java and Web Services are not naturally high performance as component models • High performance often not essential for coarse grain objects • Web Services support multiple implementations allowing performance functionality trade-off uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Object Size & Distributed/Parallel Simulations • All interesting systems consist of linked entities • Particles, grid points, people or groups thereof • Linkage translates into message passing • Cars on a freeway • Phone calls • Forces between particles • Amount of communication tends to be proportional to surface area of entity whereas simulation time proportional to volume • So communication/computation is surface/volume and decreases in importance as entity size increases • In parallel computing, communication synchronized; in distributed computing “self contained objects” (whole programs) which can be scheduled asynchronously uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Sets of Grid Points Sets of Services(programs) Sets of macroscopic objects Some Problem Classes • Classic HPCC: synchronized objects with regular time structure (communication overhead decreases as problem size increases) • Includes PDE and interacting particle based applications • Give scaling parallelism on large MPP’s • Grid: Internet Technology and Commercial Application Integration: Large objects with modest communications and without difficult time synchronization • Compose as independent (pipelined) services • Includes some approaches to multi-disciplinary simulation linkage • Hardest: smallish objects with irregular time synchronization • Event driven simulations (HLA-RTI) used here uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
What is a Web Service I • A web service is a computer program running on either the local or remote machine with a set of well defined interfaces (ports) specified in XML (WSDL) • In principle, computer program can be in any language (Fortran .. Java .. Perl .. Python) and the interfaces can be implemented in any way what so ever • Interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining) but • The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python • Web Services separate the meaning of a port (message) interface from its implementation • Enhances/Enables Re-usable component model of ANY electronic resource uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
PaymentCredit Card WSDL interfaces Security Catalog Warehouse shipping WSDL interfaces What is a Web Service II • Web Services have important implication that ALL interfaces are XML messages based. In contrast • Most Windows programs have interfaces defined as interrupts due to user inputs • Most software have interfaces defined as methods which might be implemented as a message but this is often NOT explicit uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Web Service (WS) WS WS WS WS WS WS RawResources Raw Data Raw Data (Virtual) XML Data Interface WS WS etc. XML WS to WS Interfaces (Virtual) XML Knowledge (User) Interface Render to XML Display Format (Virtual) XML Rendering Interface Clients uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
UDDI or WSIL WSFL WSDL SOAP or RMI HTTP or SMTP or IIOP or RMTP TCP/IP Physical Network Details of WSDL Protocol Stack • UDDI finds where programs are • remote( (distributed) programs are just Web Services • WSFL links programs together(under revision?) • WSDL defines interface (methods, parameters, data formats) • SOAP defines structure of message including serialization of information • HTTP is negotiation/transport protocol • TCP/IP is layers 3-4 of OSI • Physical Network is layer 1 of OSI uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
XMLSkin XMLSkin Data base e-Science/Grid/P2P Networks are XML Specified Resources connected by XML specified messagesImplementation of resource and connection may or may not be XML Message Or Event Based InterConnection Software Resource Software Resource uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
What is a Grid Web Service? • There are generic Grid system services: security, collaboration, persistent storage, universal access • OGSA (Open Grid Service Architecture) is implementing these as extended Web Services • An Application Web Service is a capability used either by another service or by a user • It has input and output ports – data is from sensors or other services • Consider Satellite-based Sensor Operations as a Web Service • Satellite management (with a web front end) • Each tracking station is a service • Image Processing is a pipeline of filters – which can be grouped into different services • Data storage is an important system service • Big services built hierarchically from “basic” services • Portals are the user (web browser) interfaces to Web services uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Sensor Web Service Distributed Sensor Web Service Output Web Service portsUniversal sensor accessfor people/computers Input Web Service portsDifferent formatSensor Data uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Prog1WS Prog2WS Filter1WS Filter2WS Filter3WS Build as multiple interdisciplinaryPrograms Build as multiple Filter Web Services Sensor Data as a Webservice (WS) Simulation WS Simulation WS Data Analysis WS Data Analysis WS Sensor ManagementWS Visualization WS Visualization WS Application Web Services • Note Service model integrates sensors, sensor analysis, simulations and people • An Application Web Service is a capability used either by another service or by a user • It has input and output ports – data is from users, sensors or other services • Big services built hierarchically from “basic” services SLE (space Link Extension) as a WS uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
The Application Service Model • As bandwidth of communication (between) services increases one can support smaller services • A service “is a component” and is a replacement for a library in case where performance allows • Services (components) are a sustainable model of software development – each service has documented capability with standards compliant interfaces • XML defines interfaces at several levels • WSDL at Service interface level and XSIL or equivalent for scientific data format • A service can be written as Perl, Python, Java Servlet, Enterprise Javabean, CORBA (C++ or Fortran) Object … • Communication protocol can be RMI (Java), IIOP (CORBA) or SOAP (HTTP, XML) …… uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Some General Grid or Web Services uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Some Science Web Services • These build on general (community) web services uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Education as a Web Service • Can link to Science as a Web Service and substitute educational modules • “Learning Object” XML standards already exist from IMS/ADL http://www.adlnet.org – need to update architecture • Web Services for virtual university include: • Registration • Performance (grading) • Authoring of Curriculum • Online laboratories for real and virtual instruments • Homework submission • Quizzesof various types (multiple choice, random parameters) • Assessment data access and analysis • Synchronous Delivery of Curricula • Scheduling of courses and mentoring sessions • Asynchronous access, data-mining and knowledge discovery • Learning Plan agents to guide students and teachers uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Different Web Service Organizations • Everything is a resource implemented as a Web Service, whether it be: • back end supercomputers and a petabyte data • Microsoft PowerPoint and this file • All Resources communicate via messages • Grids and Peer to Peer (P2P) networks can be integrated by building both in terms of Web Services with different (or in fact sometimes the same) implementations of core services such as registration, discovery, life-cycle, collaboration and event or message transport ….. • Gives a Peer-to-Peer Grid uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Database Database Event/MessageBrokers Event/MessageBrokers Integrate P2P and Grid/WS Peer to Peer Grid JXTA Web Service Interfaces Web Service Interfaces JXTA A democratic organization Peer to Peer Grid uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Role of Event/Message Brokers • We will use events and messages interchangeably • An event is a time stamped message • Our systems are built from clients, servers and “event brokers” • These are logical functions – a given computer can have one or more of these functions • In P2P networks, computers typically multifunction; in Grids one tends to have separate function computers • Event Brokers “just” provide message/event services; servers provide traditional distributed object services as Web services • There are functionalities that only depend on event itself and perhaps the data format; they do not depend on details of application and can be shared among several applications • NaradaBrokering is designed to provide these functionalities • MPI provided such functionalities for all parallel computing uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Destination Source Matching Routing Filter workflow Web Service 1 Web Service 2 (Virtual)Queue WSDLPorts WSDLPorts Broker NaradaBrokering implements an Event Web Service • Filter is mapping to PDA or slow communication channel (universal access) – see our PDA adaptor • Workflow implements message process • Routing illustrated by JXTA • Destination-Source matching illustrated by JMS using Publish-Subscribe mechanism uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Features of Event Service I • MPI nowadays aims at a microsecond latency • The Event Web Service aims at a millisecond latency • Typical distributed system travel times are many milliseconds (to seconds for Geosynchronous satellites) • Different performance/functionality trade-off • Messages are not sent directly from P to S but rather from P to Broker B and from Broker B to subscriber S • Synchronous systems: B acts as a real-time router/filterer • Messages can be archived and software multicast • Asynchronous systems: B acts as an XML database and workflow engine • Subscription is in each case, roughly equivalent to a database query uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Features of Event Web Service II • In principle Message brokering can be virtual and compiled away in the same way that WSDL ports can be bound in real time to optimal transport mechanism • All Web Services are specified in XML but can be implemented quite differently • Audio Video Conferencing sessions could be negotiated using SOAP (raw XML) messages and agree to use certain video codecs transmitted by UDP/RTP • There is a collection of XML Schema – call it GXOS – specifying event service and requirements of message streams and their endpoints • One can sometimes compile message streams specified in GXOS to MPI or to local method call • Event Service must support dynamic heterogeneous protocols uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Features of Event Web Service III • The event web service is naturally implemented as a dynamic distributed network • Required for fault tolerance and performance • A new classroom joins my online lecture • A broker is created to handle students – multicast locally my messages to classroom; handle with high performance local messages between students • Company X sets up a firewall • The event service sets up brokers either side of firewall to optimize transport through the firewall • Note all message based applications use same message service • Web services imply ALL applications are (possibly virtual) message based uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Data base Broker Network (P2P) Community For message/events service Broker Broker (P2P) Community Resource Broker Broker Broker (P2P) Community Software multicast Broker (P2P) Community uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
System Structure I • Systems are a dynamic mix of structured and unstructured entities • P2P systems like JXTA support unstructured systems realized by opportunistic messaging “broadcast locally” over a certain “network distance” • Java Message Service JMS supports structured systems where clients (message endpoints) link to one of a known set of “central servers” • Event system must support • Advertise capability – Publish • Advertise need – Subscribe both for type and form of messages • Transport designated messages/events uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Data base Single Server P2P Illusion Traditional Collaboration Architecturee.g. commercial WebEx (JMS Style) Collaboration Server uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
System Structure II • One could think that the world is a well defined structure of unstructured systems • Unstructured dynamic systems are P2P (JXTA) Peer Groups • Peer Groups could be cluster of students in a class for distance learning or cluster of Grid (OGSA) Web services generated to support running a job • But maybe it is a set of structured communities with unstructured connection • NaradaBrokering needs to support both models and those in between • Currently has JMS mode, JXTA mode and Native (most powerful) mode • P2P usually thought of as a set of “unruly dangerous clients” but can equally well be used securely as a middleware interaction mode between web services uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Database Database Grid Middleware Grid Middleware Grid Middleware Grid Middleware MP Group MP Group MP Group MP Group MP=Middleware Peer uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Community Grids Laboratory Activities I • Core NaradaBrokering Event Service • Operation in JMS or JXTA mode to demonstrate integration of central and peer-to-peer mode • Focus is Performance and Capabilities (see later) • Garnet synchronous collaboration environment used for distance education and seminars • Built first on commercial JMS but ported to Narada – shows that one can afford to use message service in synchronous application sharing • Interface of Garnet to PDA with message size filtering and optimized HHMS message service • This filtering also needed for slow clients – mix of dial-ups and Internet2 clients in a collaboration • Event system supports (XML) client profiles uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
NaradaBrokering Performance Results uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Low Rate; Small Messages NaradaBrokering and JMS uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Small Payload Larger Payload NaradaBrokering and JXTA Comparing Pure JXTA, Narada-JXTA and Direct P2P There is a bug in JXTA and this was only just fixed Narada-JXTA provides JXTA guaranteed long distance delivery uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
JXTA is getting slower Client JXTA JXTA Client Client JXTA Narada JXTA Client Client JXTA JXTA Client multicast Narada Client Pure Narada 2 hops Client Narada uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
Batik Viewer on PC Collaborative SVG Viewer PC to PDA PC Collaboration system PowerPoint can be converted to SVGvia Illustrator or Web export uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"
PDA Collaboration Event Filter GMSME : iPaq H3650, WinCE 3.0, Personal-Java1.1 Wireless 11 Mbit/s IEEE 802.11b GMS =JMS orNarada Doing This now uri="http://grids.ucs.indiana.edu/ptliupages" email="gcf@indiana.edu"