110 likes | 200 Views
Large dataset processing in the Cloud. Kevin Glenny and GridwiseTech team. Simplified data oriented system. applications working on data. Internal or external data sources. IT systems are constantly growing. Increased number of users. Increased number of applications. Increased amount
E N D
Large dataset processing in the Cloud Kevin Glenny and GridwiseTech team
Simplified data oriented system applications working on data Internal or external data sources
IT systems are constantly growing Increased number of users Increased number of applications Increased amount of data
IT systems are constantly growing Infrastructure bottleneck
Example • Electronics manufacturer • 24/7 production • Report computation too long for decision making • 2.5 million transactions daily • 4TB data to manage
What is Cloud computing? • „Transparant access to capabilities using a pay-per-use business model” • Benefits: • Dynamic scaling • Pay-for-use • Off-shored administration
What are the delivery models? • SaaS (Software as a Service) • SalesForce.com, 63,00 clients PaaS (Platform as a Service) • Google App Engine (2008), Microsoft Azure (2008) IaaS (Infrastructure as a Service) • Amazon Elastic Compute Cloud, 8.2 million instances launched since 2006
Application data processing • Database sharding (MySQL, postgreSQL etc.) • NoSQL (Google's BigTable, Amazon's Dynamo etc.) • Data-grid (GigaSpaces XAP, Oracle Coherance, InfiniSpan etc.)
Data-grid and sharding in the Cloud • Achievements: • Near real-time • Dynamic scaling (application • and resources) • Pay-per-use • Reduced administration • HA All data processing and persistence in the Cloud
Remaining issues • Getting large datasets in and out of the Cloud • Bandwidth limited client side • Resort to mailing hard drives! • Performance - 2 to 50% slow down • Data security/privacy - trust • SLAs – plan for the worst
Conclusions • Data oriented systems datasets grow causing bottlenecks • Datasets in the Cloud can be processed using scalable technologies • Challenges remain • Main – how to get the data to the Cloud?