250 likes | 276 Views
Grid Computing – Issues in Data grids and Solutions. Sudhindra Rao. Outline. Grid Computing – introduction Computational Grids Data Grids Data Management Related Work Technologies – JavaSpaces, OceanStore Our research plan Discussion. What is grid computing?. Use a network of PCs
E N D
Grid Computing – Issues in Data grids and Solutions Sudhindra Rao
Outline • Grid Computing – introduction • Computational Grids • Data Grids • Data Management • Related Work • Technologies – JavaSpaces, OceanStore • Our research plan • Discussion OSCAR Lab
What is grid computing? • Use a network of PCs • Faster networks, cheaper PCs, lot of idle time • Easy to build, maintain, scale • Generic solution for scientific and business problems alike • Some form of grid computing - SETI@Home, Argonne National Lab, Google etc. OSCAR Lab
Market Dynamics Grid Computing New Opportunities Maturing Technology World Events Why today? Goals Efficiency Profitability Capabilities Security Manageablity Agility Control Uncertainty Complexity Distribution OSCAR Lab
Compute- intensive analytics OLAP data analysis Data Center operations Compute Utility services • In-process system migration • High fault tolerance • Geographic data center independence for failover and business applications • Data center compute farms • Corporate compute utility • services creating a low-cost infrastructure similar to the electric grid • Anti-money laundering • Credit card (risk and customer • Data mining) • Billing • Value at risk • Credit risk • Real-time risk management • Automated trade programs Applications – data grids • Geographic distribution of data • Computations on large scale data OSCAR Lab
Middleware Data queues Publish/Subscribe Smart routing File sharing CORBA Data translation Distributed Computing Evolution Pipes/sockets Clusters Data grids Utility service Grid Computing Client/Server Evolution of distributed computing OSCAR Lab
Compute grid • Distributed pool of resources • Completing a task for a user • User requests and reserves resources • Some kind of middleware manages resources and tasks • Resilient and fault tolerant OSCAR Lab
Compute grid – coordinating set of tasks Client Client Network pipe 1-1 connectivity Network pipe 1-1 connectivity Multiple applications/worker threads accessing single datastore Business AppServer Server Server Data Storage Data grid OSCAR Lab
Compute grid – coordinating set of tasks Data grid – eliminates data access bottlenecks Data Storage Data grid – manages data OSCAR Lab
Mechanism neutrality Policy neutrality Compatibility with compute grid Uniformity with information infrastructure Services Storage Service Grid storage API Metadata service Data grid architecture OSCAR Lab
Expectations Coordination between compute and data grid Data delivery to facilitate task and resource management Sharing data distribution and location information Leveraging data locality Guarantees Dependability Consistency Pervasiveness Security Inexpensive Data grid architecture OSCAR Lab
Monte Carlo Simulation OLAP Real-time datamart Level 1 Data Grid QoS Level 0 Batch Synchronous Static data Nontransactional Atomic Synchronous Static Data Nontransactional Atomic Asynchronous Static Data Nontransactional Atomic Asynchronous Dynamic data Nontransactional Atomic Synchronous Static data Transactional Atomic Asynchronous Dynamic data Transactional Atomic Asynchronous Static data Transactional Batch Synchronous Static data Transactional Application Complexity Work, Time, Data, Transactional Data delivery - QoS requirements OSCAR Lab
Related Work • Grid File System - provides primitives like a file system – Level 0 QoS • NFSv4 – High performance, extensible, secure – in the works • Secure File System – self certifying paths, unique identifiers, global namespace, key based certification OSCAR Lab
Technologies related to data grids - JavaSpaces “Make Room for JavaSpaces, Part IEase the Development of Distributed Apps with JavaSpaces” - Eric Freeman and Susan Hupfer OSCAR Lab
OceanStore • Global replication of data • Promiscuously caches data • Version based archival storage • Applications can control their consistency requirements to manage performance • Internal event monitors analyze access patterns to move data and provide redundancy OSCAR Lab
Grid Fabric - Integrasoft • Business solution provided for financial institutions, share traders • Designed to complement compute grid • Works closely with compute grid to schedule tasks based on data availability • Moves data closer to computation OSCAR Lab
Business process Delivers has WebServices State Requires Data Grid SOA and Data grids • Moore’s law and Metcalf’s law • Network based computation and grid computing with SOA • Intelligent infrastructure – SONA OSCAR Lab
Web 2.0 OSCAR Lab
Our research – Motivation Issues in data management • Data tightly coupled to computation • Data cached locally • Distribution is haphazard and reuse is minimal • Data pulled by computation – not delivered • Mechanisms still improvise based on experience on smaller systems OSCAR Lab
Grid DBMS Security Transparency Robustness Efficiency Intelligence Fragmentation Heterogeneity Data Grid and DBMS OSCAR Lab
Data grid – eliminates data access bottlenecks Persistence Mechanism – with data regions indicates Replicas, relations Data Storage Data grids as extended DBMS OSCAR Lab
Datacentric grids • Automated space management and garbage collection • Space and data objects lifetime mechanism • I/O allocation on storage system • Estimating access from Magnetic storage • Co-scheduling of compute and storage resources • Space reservation dilemma • Thin clients • Code mobility towards data OSCAR Lab
Expected Results • Can we move computation closer to data? • Data grid –with features of persistence? • Performance improvement using tags? • Loosely coupled data grid and compute grid? • Scalability of unique naming in file systems? OSCAR Lab