110 likes | 278 Views
The world of highly heterogeneous data management. Highly heterogeneous computing. Applications Desktop Server-based Web-based Mixed Ubiquitous computing Mass produced computers Layered persistence systems (e.g., disk arrays and solid state caches) Mass collection of sensor data
E N D
Highly heterogeneous computing • Applications • Desktop • Server-based • Web-based • Mixed • Ubiquitous computing • Mass produced computers • Layered persistence systems (e.g., disk arrays and solid state caches) • Mass collection of sensor data • Truly mobile computers • Embedded computers
The extended web • Ambient intelligence • Embedded computing in everyday devices • Soda machines that know what you drink • Electronic payments and shopping in many forms • The airport computer that adapts to a wide variety visitors • The Internet of things • Coordinating the activities of dedicated computers • Using RFID tags
That word “polyglot” • Highly varied data needs in a single application • Multiple data stores • Multiple languages • Data • Business data • Data mining • Data warehousing • Sensor data • Document databases • Real time data, such as stock markets
Various models of polyglot data • Traditional: all data in a single relational product • Many separate databases • Shopping cart, session data, completed orders, warehouse-driven analysis • The new model? • Shopping cart and session data in a key-value store • Completed orders in a document store • Inventory and pricing in a legacy relational db • Customer social graph in a graph store
services • All databases are wrapped/mediated with/by service layers • But modern services are often time-dependent • Applications • Real-time: financial, traffic, news services • Shopping services: interactive • Mobile services: buying gasoline
Problems with going polyglot • Cost of software • Highly complex environments • Cost of diverse programming professionals • Security of highly heterogeneous environments with outside services • Lots of new/old gateways • Untried technology • Unpredicted problems • Lack of adaptability • Very narrow forms of optimization • Unknown security issues • Will it be gone tomorrow?
Alternatives • File systems with new capabilities sitting underneath heterogeneous stores • Replication • Sharding • Clustering • Adaptable concurrency for speed/consistency tradeoffs • Bootstrapping on relational dbs, missing only horizontal scaling • Warehousing • Mining • Images, video, audio • Documents • Objects
Remaining challenges • Real time apps • Managing media • Engineering designs • Finding, converting, and integrating mining data • Processing and filtering sensor data • Big Data scientific and other data • Temporal data
An alternative technology:xml databases • XML data types • Available in common relational databases • Compatible with key-value and key-document applications • Namespaces and assertions and schema fragments • Available for sharing • Lots of Semantic Web development for reuse • XPath and XQuery searching • Well understood because of relational underpinnings and wide use • Highly flexible, can specialize for almost any domain • Suitable for schema-based and self-defining data
Assignment 5: extra credit • Create a simple app that manages spatial data • Don’t worry about indexing • Choose a 3D environment • Mining, oil reserves, etc. • Create data structures • Create queries that • Evaluate overlap and inclusion 3-space • Evaluate distance between two points in 3-space • I’ll give you credit for two assignments, will be added into your average for the assignments (so you can get over 100%) • If you really want to go for it, create an interactive interface