330 likes | 478 Views
The Anatomy and Physiology of the Grid Revisited. Nenad Medvidovic USC-CSSE and Computer Science Department University of Southern California neno@usc.edu http:// csse.usc.edu/~neno / Collaborative work with Joshua Garcia, Ivo Krka , Chris Mattmann , and Daniel Popescu. What is the grid?.
E N D
The Anatomy and Physiologyof the Grid Revisited Nenad MedvidovicUSC-CSSE and Computer Science DepartmentUniversity of Southern California neno@usc.edu http://csse.usc.edu/~neno/ Collaborative work with Joshua Garcia, IvoKrka, Chris Mattmann, and Daniel Popescu
What is the grid? • A distributed systems technology that enables the sharing of resources across organizations scalably, efficiently, reliably, and securely • Analogous to the electric grid
Why Study the Grid? • A highly successful technology • Deficiencies in the existing guidance for building grids • More to come • Grids are not easy to build • See CERN’s Large Hadron Collider • Their architecture was published very early • “anatomy” and “physiology” • Yet “What is (not) a grid?” is still a subject of debate
The Architectural Perspective • Grids are large, complex systems • Thousands of nodes or more • Span many agency boundaries • Qualities of Service (QoS) are critical • Scalability • Security • Performance • Reliability ... • Software architecture is just what the doctor ordered • The set of principal design decisions about a software system [Taylor, Medvidovic, Dashofy 2009]
So, What Did We Set out to Do? • Study grid’s reference requirements and architecture • Study the architectures of existing grid technologies • Compare the two • Knowing that there will likely be very few straightforward answers • Suggest how to fix any discrepancies • Knowing that there will likely be very few straightforward answers
Architecture Recovery Technique- Focus - • Establish idealized architecture and candidate architectural style(s) • Identify data and processing components • Groups implementation modules according to a set of rules • Map identified data and processing components onto an idealized architecture • Examine • Source code • Documentation • Runtime behavior • Tie to requirements satisfied by component
Rules of Focus • Group based on isolated classes • Group based on generalization • Group based on aggregation • Group based on composition • Group based on two-way association • Identify domain classes • Merge classes with a single originating domain class association into domain class • Group classes along a domain class circular dependency path • Group classes along a path with a start node and end node that reference a domain class • Group classes along paths with the same end node, and whose start node references the same domain class
Some Refinements to the Rules • Domain class rules • Class with large majority of outgoing calls • Exclusion rules • Class with large majority of incoming calls • Utility classes • Heavily passed data-structures • Benchmarking and test classes • Additional groupings • By exception • By interface • By package if idealized architecture matches first-class component
Focus Rules for Distributed Systems • Infer distributor connectors from idealized architecture • Classes with methods and names similar to first-class components are domain classes • Classes importing network communication libraries are domain classes • main() functions often identify first-class components • Classes deployed onto different hosts must be grouped separately
Discovered discrepancies • Empty layers • Skipped Layers • Up-calls • Multi-layer components
Two layer boundary AND Upcall upcall Couldn’t determine right “layer” upcall Two layer boundary AND Upcall upcall Two layer boundary AND Upcall What about Globus?
Revised Grid Architecture • The connectivity layer is eliminated • Explicitly addressing deployment view • Subsystem types rather than layer-oriented • Four architectural styles comprise the grid • Client/server • Peer-to-peer • Layered • Event-based • An improved classification of grid technologies
Grid Styles – C/S • Applicationcomponents are clients to Collectivecomponents • e.g., application components query for resource component locations from collective components • Applicationcomponents are clients to Resourcecomponents • e.g., direct job submission from application components to resource components • Resourcecomponents can act as clients to Collectivecomponents • e.g., resource components may obtain locations of other resource components through collective components
Grid Styles – p2p • Resource componentsare peers • e.g., Grid DatafarmFilesystem Daemon (gfsd) instance makes requests for file data from other gfsds • Collective components are peers • e.g., iRODS agents communicate with each other to exchange data to create replicas
Grid Styles – Event-Based • Resource components notify Collective components that monitor them • e.g., executors send heartbeats to managers
Grid Architectural Styles – Layered • Collective or Resource components request services from Fabric components • e.g., iRODS agent accesses a DBMS with metadata
Grid Technology Classification • Computational grid • Implementing all Collective components • e.g., Alchemi and Sun Grid Engine
Grid Technology Classification • Data grid • Job scheduling components in Collective subsystem are not required • e.g., Grid Datafarm and Hadoop
Grid Technology Classification • Hybrid • Resource components providing services either to perform operations on a storage repository or to execute a job or task • e.g. Gridbus Broker and iRODS Computational Resource File Resource
Correcting Violations in the Reference Architecture • Why were there originally so many upcalls? • Legitimate client-server and event-based communication • Why so many skipped layer calls? • The Fabric layer was at the wrong level of abstraction • Mostly utility classes that should be abstracted away • Why so many multi-layer components? • Connectivity layer was at the wrong level of abstraction • Not a layer, but utility libraries to enable connector functionality • Also accounts for skipped layer calls • Benefit of the deployment view • Essential for distributed systems • Helped to identify that the Fabric layer was not abstracted properly
Where Are We Currently? • There are remaining violations • Are they legitimate or a result of an improperly recast reference architecture? • Original Focus is not ideal for recovering systems of these types • Distributed systems realized by a middleware • A more automated approach that combines static and dynamic analysis would be preferable • Use the recast reference architecture to build a new grid • What are the overarching grid principles?
Evolving Grid Principles • A grid is a collection of logical resources (computing and data) distributed across a wide-area network of physical resources (hosts). • In a single grid-based application, the logical resources are owned by a single agency, while the physical resources are owned by multiple agencies. • All resources in a grid are described using a common meta-resource language. • Atomic-level logical resources are defined independently of the atomic-level physical resources. • The allocation of the atomic-level logical resources to the atomic-level physical resources can be N:M. • All computation in a grid is initiated by a client, which is a physical resource. The client sends the logical resources to the servers, which are also physical resources. A server can, in turn, delegate the requested computation to other physical resources. • All agencies that own physical resources in a grid must be able to specify policies that enforce the manner in and extent to which their physical resources can be used in grid applications.