250 likes | 266 Views
8. Special database types. Distributed databases Distribution of data: Several host sites. Availability and reliability: replicated data Distributed concurrency control Distribution of users: Client-server architecture Web databases; three-tier architecture.
E N D
8. Special database types Distributed databases • Distribution of data: • Several host sites. • Availability and reliability: replicated data • Distributed concurrency control • Distribution of users: • Client-server architecture • Web databases; three-tier architecture AdvDB-8 J. Teuhola 2015
Distributed databases: Requirements • Replication and partitioning of data • Maintenance of a location map for data • Query optimization for multiple hosts • Maintenance of consistency among replicas after update operations • Recovery from network failures • Partial usability when some hosts are down • Management and control of access rights AdvDB-8 J. Teuhola 2015
Distributed databases: Advantages • Improved efficiency by replication: data close to users, preferably in the local host. • Improved reliability by replication: When one host is down, others continue to operate. Data is accessible when one copy is available. • Transparency: The user does not need to know the location of data / replicas / partitions. • Extensibility: new nodes can be added to the network. AdvDB-8 J. Teuhola 2015
Example: distributed join • Relation R(X, Y, Z) stored in host A • Relation S(Z, W) stored in host B • Steps of natural join R * S for host A: • Send column R(Z) from A to B • Compute semijoin T(Z, W) = R(Z) * S(Z, W) in B • Send relation T back to A • Compute the final join R * T • Note: the last step can be replaced by concatenation if duplicates are maintained in W and T AdvDB-8 J. Teuhola 2015
Deductive (logic) databases Main features: • ‘Data’ consists of facts and rules. • Declarative language to define them • Inference engine = deduction mechanism for solving queries Related areas: • Relational data model (esp. relational calculus) • Logic programming (Prolog) • Datalog: Subset of Prolog AdvDB-8 J. Teuhola 2015
Deductive databases: Example in Datalog Facts: parent(x, y) means that y is x’s parent parent(peter,mary). parent(peter,paul). parent(mary,john). parent(paul,joan). Rules: ancestor(x, y) means that y is x’s ancestor ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y). Queries: (1) ancestors of Peter, (2) descendants of Joan ?- ancestor(peter,?). ?- ancestor(?,joan). AdvDB-8 J. Teuhola 2015
Data warehouses • Support for decision making. • Derived, integrated and refined from operational databases. • No transaction processing, not quite up-to-date. • Multidimensional view of data (data cube) • OLAP = On-Line Analytic processing. • Summary and multidimensional data. • Statistical analysis tools. • Data mining tools. AdvDB-8 J. Teuhola 2015
Sales-person Date Product Example: data cube on sales • Sales values per salesman, product and date AdvDB-8 J. Teuhola 2015
Example: ‘Star’ schema for data warehouse AreaTable AreaNo Name Seller SalesTable ProdNo AreaNo Date Amount Value ProdTable Prod-no Name Descr Group TimeTable Date DayOfWeek ‘Fact table’: Sales ‘Dimension tables’: Prod, Area Time AdvDB-8 J. Teuhola 2015
XML databases: ‘semi-structured data’ • Storage and retrieval of XML documents: structured using nested pairs of tags • Flexible, hierarchical schema • Alternative implementations for XML databases: • Relational database: various alternatives • Object database: more direct mapping of the structure • Native XML database: built from scratch, tailored especially for this data type • Query Language: XQuery AdvDB-8 J. Teuhola 2015
<?xml version=“1.0”?> <course> <cname>Adv DB</cname> <teacher>Timo</teacher> <audience> <student>Pasi</student> <student>Pirjo</student> </audience> </course> <?xml version=“1.0”?> <course> <cname>C++</cname> <teacher>Esa</teacher> <audience> <student>Pasi</student> <student>Pia</student> </audience> </course> Example document collection: 2 courses AdvDB-8 J. Teuhola 2015
Course document 1 Course document 2 Illustration as tree structures AdvDB-8 J. Teuhola 2015
cid course document <?xml…?><course><cname>AdvDB</cname><teacher> Timo</teacher><audience><student>Pasi</student> <student>Pirjo</student></audience></course> c1 <?xml version=“1.0”?><course><cname>C++</cname> <teacher>Esa</teacher><audience> <student>Pasi </student><student>Pia</student></audience></course> c2 Relational alternative 1:XML data type for a column Courses-relation AdvDB-8 J. Teuhola 2015
Relational alternative 2:Non-typed nodes Nodes-relation node-id element parent text-valuen1 course - -n2 cname n1 Adv DBn3 teacher n1 Timon4 audience n1 -n5 student n4 Pasin6 student n4 Pirjon7 course - -n8 cname n7 C++… … … … AdvDB-8 J. Teuhola 2015
Relational alternative 3:Typed nodes Courses cid cname teacherc1 Adv DB Timoc2 C++ Esa Audience student cidPasi c1Pirjo c1Pasi c2Pia c2 AdvDB-8 J. Teuhola 2015
Digital libraries • Organized collection of information ( web) • Close to multimedia databases, but more focused on information retrieval features • Two types of users: • End users make retrievals • Librarians select, organize and maintain the collection. • Important: Metadata and annotations • Hard job: digitalization of ’real’ libraries AdvDB-8 J. Teuhola 2015
Spatial databases • Representations: Solid (2D, 3D), boundary, abstract (‘above’, ‘near’, ‘under’, ...) • Objects: points, line segments, rectangles • Spatial operations (intersection, nearest neighbor, spatial join, ...) • Important application area:GIS = Geographic Information system(objects on maps). • Temporal dimension may be included (movement, order of events) AdvDB-8 J. Teuhola 2015
Scientific databases • Large amounts of observed data (raw, calibrated, validated, derived, interpreted) • Updated seldom - transaction processing not needed. • One form of data warehouse. • Metadata is crucial • Example of scientific database: genome and protein data in bioinformatics (sequences, 3D-structures) AdvDB-8 J. Teuhola 2015
Multimedia databases • Text, hypertext, images, graphics, audio, video • Applications: Media servers, audio/video-on-demand, document management, educational services, marketing, intelligent systems, digital libraries, medical information systems, etc. • Issues: Modeling (complex objects), design, storage of large objects (LOBs), compression, retrieval (indexes), performance (critical for audio/ video). AdvDB-8 J. Teuhola 2015
Multimedia databases: Required features • Supports the main types of multimedia (MM) data • Can handle a very large number of MM objects • Supports high-performance, high-capacity storage management • Offers DB capabilities: Persistence, transactions, concurrency control, recovery from failures, querying with high-level declarative constructs, versioning, integrity constraints, security. • Offers information-retrieval capabilities: Exact-match retrieval, probabilistic (best-match) retrieval, content-based retrieval, ranking of results AdvDB-8 J. Teuhola 2015
Multimedia databases: Functional considerations • Interactive querying • Relevance feedback • Query refinement • Automatic feature extraction and indexing • Content- and context-based indexing of different media • Single- and multidimensional indexing AdvDB-8 J. Teuhola 2015
Multimedia databases: Functional considerations (cont.) • Clustering of media data on storage devices • Support for efficient access of very large media objects • Optimization of multimedia queries and retrieval, supported by sophisticated indexing • Replication, parallelism, distribution, scalability • Recent approach: NoSQL databses, with relaxed requirements of consistency, compared to traditional ACID (see Chapter 3) AdvDB-8 J. Teuhola 2015
NoSQL databases • ”Not only SQL” • ”Big Data” applications, e.g. search engines, social media, data streams, observation data • Traditional relational technology does not scale well to huge amounts of data. • Typical of NoSQL systems: • Requirement for very efficient retrieval • Real-time updating can be relaxed • Large-scale distribution is required AdvDB-8 J. Teuhola 2015
NoSQL approaches • Key–value storesE.g. DynamoDB (Amazon) • Column storesEg. BigTable (Google), Cassandra (Apache) • Graph databasesE.g. Neo4j (Open-source, Java-based) • Document storesE.g. Native XML databases AdvDB-8 J. Teuhola 2015
End of slides – Remember also the exercises! AdvDB-8 J. Teuhola 2015