1 / 15

The Cloud and databases

The Cloud and databases. Issues. What kind of data management is a good fit with the cloud?. Analytical data management: data attributes Far more reads than writes, so security and privacy less of an issue Tend to have far greater data needs, so there is a need for more servers

chynna
Download Presentation

The Cloud and databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Cloud and databases Issues

  2. What kind of data management is a good fit with the cloud? • Analytical data management: data attributes • Far more reads than writes, so security and privacy less of an issue • Tend to have far greater data needs, so there is a need for more servers • The size of the data set grows over time and does not stabilize, so a better fit with expanding cloud server availability • Analytical applications often want data from multiple sources, and availability is much better in a cloud environment

  3. More on analytical processing • Analytical Data Managements: system attributes • Shared nothing works better when access is mostly reads • ACID transactions do not need to be enforced as there is no need for a single, global state for all users • Generally, statistical results are okay even if some very secure data is not discovered

  4. What is needed for new generation of cloud dbs? • Focus on making use of broad parallelism and on shifting/expanding set of servers • Looser notion of fault tolerance, as there is often no need to restart an interrupted query or if a branch of a query is killed • Need to be able to operate on data in multiple formats, encryptions, attribute domains, namespaces, schemas, database products – heterogeneity! • Must be able to sit underneath business intelligence systems

  5. Hybrid databases: is this the answer? • Folks don’t want to learn/buy/program new data management products • But folks do want commercial grade systems with professional support • Would make the transition from transaction apps to analytical apps easier – like with relational data warehousing • But would we end up with an inelligant mess?

  6. What about Object Databases?A return? • Blending a host language with a query language makes sense when queries involve complex calculations • It is easy to extend an o-o language with statistical procedures • The encapsulation of o-o languages is a good match with the wide and independent distribution of data in a cloud environment • O-O procedures could be built and deployed by distributed volunteers

  7. Mope on O-O DBS • Partial results could be maintained and kept up to date, with batch updating of raw data only infrequently • We know how to build multiple language interfaces to accommodate multiple o-o languages • O-O databases are a good match with service-based interfaces – see diagram on page 29

  8. Object-oriented dbs: relevant research & dev. • Adaptive query processing and optimization in real time • Parallel and distributed database technology • Massively parallel systems • Shared nothing systems • Data management stream technology

  9. Problem: most business data right now is in a relational foRMat • We don’t have truly massively parallel and distributed query models for relational data • We don’t have truly massively parallel and distributed data partitioning for relational data • To perform efficient and fluid analytical processing of data in the cloud, we would need to create new links quickly, but we won’t have a focused, fixed schema as we do in standard relational systems • Object extensions to relational systems don’t include method encapsulation, only expanded domains

  10. More cloud issues: centralized control? • Is the cloud trusted or anonymous? • Trusted, provider-specific commercial cloud solutions are much safer, centrally managed, and optimized as a single network, not as a mesh of networks • In many environments, even trusted, centralized environments, many machines are not properly managed and are controlled by immediate users • People don’t like their machines being co-opted, and so trust is not enough to guarantee dependibility

  11. More on the cloud:Other applications? • Is analytical processing the only likely application? • There are many data sharing applications • There are many applications for selling access to bulk data • Data mining is a more focused form of analytical processing, but demands a very precise level of heterogeneity resolution and integration in the case of most medical and financial applications (and others)

  12. Data mining • Kinds of data (from Data Mining by Han and Kamber) • Relational dbs • Data warehouses • Transaction processing systems • Object-relational dbs • Time sequence and temporal dbs • Spatial dbs • Text dbs • Multimedia dbs • Legacy dbs • Data streams • The Web…

  13. heterogeneity in databases: data mining implications • Note how broad the “Web” is on the previous slide • Includes countless hand-rolled dbs • Includes databases hidden by web development frameworks like Ruby on Rails • Includes data accessible only via specific APIs • Includes data accessible via XML and Xpath, Xquery technology • Includes data stored in proprietary databases for applications like CAD, finance, animation, geography • The heterogeneity problem will only be solved by widespread collaboration on unifying standards

  14. More on the Cloud: the future of transaction processing? • Will the rigidly centralized notion of OLTP survive? • Corporations are adapting to the cloud incrementally and using middleware to leverage their own clouds • With global business comes global data processing, across time zones, and is often managed in a widely distributed fashion • There are large corporations that handle financial and retail transactions for other companies • Are people warming to the idea of managing their personal and small business data in the cloud, including document and other services?

  15. But the cloud is process-centric and not data-centric • Is the process vs. data centric issued about to reawaken? • The process folks kind of lost… • Data is seen more and more as a valuable resource, even if it is only “sold” indirectly • More of us are buying multimedia data • There are actually 3 models, process and data centric, and encapsulated • Some argue that the cloud is actually an encapsulated model and that in fact, data movement is difficult to optimize do to the dynamic nature of the network • Object-oriented databases…?

More Related