1 / 24

Data Grid, Cloud and Vertical RDBMS

Data Grid, Cloud and Vertical RDBMS. Presenter: Dipesh Gautam. Overview. Introduction Why Data Grid? High Level View Design Considerations Data Grid Services Topology Grids and Cloud Convergence of Grid and Cloud Vertical RDBMS Benefits of column-oriented layout. Introduction.

hakan
Download Presentation

Data Grid, Cloud and Vertical RDBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Grid, Cloud and Vertical RDBMS Presenter: DipeshGautam

  2. Overview • Introduction • Why Data Grid? • High Level View • Design Considerations • Data Grid Services • Topology • Grids and Cloud • Convergence of Grid and Cloud • Vertical RDBMS • Benefits of column-oriented layout

  3. Introduction • Data Grid: an architecture or set of services that enable individual or group of users ability to access and transact large amounts of geographically distributed data. • The data may be replicated throughout the grid outside the original administrative domain of the data. • The integration between users and the data are handled and controlled by the data grid middleware.

  4. Why Data Grid? • Large dataset size • Geographic distribution of users and resources • Computationally intensive analysis • No other architecture exists that allows us to apply technologies in large scale application domains

  5. A High Level View

  6. Design Considerations • Mechanism Neutrality • Designed to be as independent as possible of low level mechanisms • Defining interfaces that sum up oddness of specific storage systems. • Compatibility with Grid Infrastructure • Take advantage of fundamental Grid infrastructure • Compatible with lower level Grid mechanisms • Uniformity of Information Infrastructure • The same data model and interface used to access the grids metadata

  7. Data Grid Services • Middleware provides following services: • Universal namespace • Data transport service • Data access service • Data replication service • Resource management system(RMS)

  8. Why Universal namespace? • Number of systems and networks are connected within a grid • Different file naming conventions of separate systems within grid • Physical file names merely do not address the problem locating the data. • Universal namespace provides logical file names • Storage Resource Broker provides service to map between logical and physical file names • Upon requesting logical file names, all matching physical file names are returned and the end user chose appropriate replica

  9. Data Transport Service • Middleware service for data transfer • The atomicity of the requested data transfer ensures the fault tolerant service • Data transfer is resumed after each interruption until all requested data is receive • Many possible strategies: • Starting the entire transmission from the beginning • Resuming from the point of interruption. E.g: GridFTP sends data from the last acknowledged byte without starting the entire transfer from the beginning. • Provides service for low-level access and connection between hosts for file transfer • Provides I/O functions that allow user to see remote files as if they were local to their system • Provides high level abstraction of the access and transfer of data between different systems hiding the complexity and presenting user as a unified data source

  10. Data access service • Work with data transport service to provide security, access control and management of data transfer within the grid • Provides security service to authenticate users • Provides authorization service to control access by simple file permission to Access Control Lists (ACLs), Role-Based Access control • Provides encryption service to protect the confidentiality of the data transport (e.g SSL )

  11. Data replication service • Why replication? • Scalability • Fast access • User collaboration • Replicas are often placed close to the sites where users need them • Replication is controlled by a replica management system • Replica management system determines the needs of replicas based on the requests • Timely update of the replica is performed by propagating the changes in some node to all the nodes in the grid

  12. Replica update • Centralized model: single master replica updates all others • Decentralized model: all peers update each other • The topology of node placement influence update strategy

  13. Replica Placement • Static replication • Uses a fixed replica set of nodes with no dynamic changes to the files being replicated • Dynamic replication • based on popularity of data • If request exceeds the replication threshold, the replica is placed on the server that directly services the client provided that the storage is available • Dynamic deletion of replicas that have null access value • Adaptive replication • The dynamic threshold is computed based on request arrival rates from clients over a period of time • The replicas with lower threshold and were not created in the current replication interval can be removed • Fair-share replication • Based on access load and storage load of candidate servers • Server with less access load is selected for replication as the replicated in server with more access load degrades the performance for all clients • Among the candidate servers with same access load, server with less storage load is selected • Lot more replication placement strategy exists

  14. Resource management system(RMS) • Core functionality of data grid • Manages all the actions related to storage resources • Fulfils user and application requests for data resources based on type of request and policies • Schedules creation of replicas • Enforces policy and security within the data grid resources by including authentication, authorization and access support systems with different administrative policies to inter-operate • Enforces system fault tolerance and stability requirements

  15. Topology • Various topologies have been used to address need of the scientific community • Four major types of topologies • Federation topology • Monadic topology • Hierarchical topology • Hybrid topology

  16. Federation Topology • Allows each institution control over their data • The institution who receives request from authorized institution determines whether to send data to the requesting institution • The federation could be loosely or tightly integrated • Preferred by the institutions that wish to share data from already existing systems

  17. Monadic topology • All the collected data is fed into a central repository • Central repository responds to all queries for data • No replicas in the topology • This topology is well suited when all access to the data is local or within a single region with high speed connectivity

  18. Hierarchical Topology • Suited for collaborating data from single source to distributed multiple locations around the world

  19. Hybrid Topology • Any combination of other topologies • Suited for researches working on projects want to share their results to further research by making it readily available for collaboration

  20. Grids and Clouds • Grid • Grid refers for distributed computing in science and engineering • In grid computing, virtual organizations share computer resources over a network • Scientific research , collaboration • Share local resources • Heterogeneous , real resource • Geographically distributed, locally owned and managed • Cloud • Cloud refers for a computer network in the context of network management • In cloud computing anybody can access data and compute services over the internet • Web services, business apps • Make huge data centers available • Homogeneous virtualized resources • Geographically distributed, centrally owned and managed

  21. Convergence of Grid and Cloud • Interoperability standards among the service providers of both grid and cloud should be considered by the user • Interoperating cloud looks like grid

  22. Vertical RDBMS • Column-Oriented DBMS • Store data column wise instead of row wise • In row oriented DBMS the values on the rows are serialized and stored in memory as: 1, Smith, Joe, 40000; 2, Jones, Mary, 50000; 3, Johnson, Cathy, 44000; • In column oriented DBMS the columns are serialized as: • 1, 2, 3; Smith, Jones, Johnson; Joe, Mary, Cathy; 40000, 50000, 44000;

  23. Benefits of Column-Oriented layout • Efficient when aggregate needs to be computed over many rows but only for notably smaller subset of columns • Efficient in writing a column when new values of column for all rows are supplied at once • Suite for Online Analytical Processing(OLAP) like workloads which involve a smaller number of highly complex queries over all data of terabyte size.

  24. References • http://en.wikipedia.org/wiki/Data_grid • http://www.globus.org/toolkit/about.html • Martin Antony Walker, Grids and Clouds, http://www.ogf.org/OGF25/materials/1500/Grids+and+Clouds+OGF25+MAW.pdf • http://staff.science.uva.nl/~adam/courses/2004/documents/Course-DataGrid.ppt • http://en.wikipedia.org/wiki/Column-oriented_DBMS

More Related