Information Retrieval and Use

Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009

Mapping the logical model onto physical design • Entities become tables • More often than not! • Attributes become fields (columns) • Unique identifiers become primary keys • Relationships implemented by foreign key columns • Resolve M:N relationships by inserting intersection table

Mapping considerations • Independence • Privacy • Efficiency of queries

Denormalisation • Joins take time! • Split or merge normalised entities based on frequent associated use • Remove redundant relationships • Merge entities with 1:1 relationships • Use summary fields • Use summary tables and views

Using summary field(1) • Consider running a query “give the total value of all orders for customer X” How many joins?

Using summary field (2) • Note summary field in Orders table How many joins now?

Distributed database systems • Special rules apply!

The traditional model • One centralised database • Terminals at remote locations • Disadvantages • Networks are slow (esp WANS!) • Central machine does all processing • If central machine fails, database is down (Integrity, redundancy and disaster recovery considered in later lectures!)

The Client/Server model • Client – application – “front end” • Server – DBMS – “back end” • Still dependent on central database

Client responsibilities • Manages user interface • Accepts user data • Has local processing capability within the application • Generates database requests and transmits them via network to server • Receives results from server and formats them as required by application

Server responsibilities • Accepts database requests from client • Processes database requests • Handles security issues • Deals with concurrency issues • Optimizes queries • Handles recovery/rollback issues • Returns results to client

Distributed database architecture • A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location. • Each site is a database in it’s own right • Not necessarily physically or geographically separated, but often are – and are logically separated.

Advantages • Organisations are distributed, why shouldn’t their data be? • Improved efficiency • Store data close to where it’s used

Types of DDS • Homogenous – same type of RDBMS at each site (easy!) • Heterogeneous – different types of DBMS at each site (not so easy!)

Implementation methods (1) • Fragmentation – splitting data between sites • Horizontal – row based – e.g. store all employee records for a location at that location • Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR • Either way, fragments must be able to be put back together!

Implementation methods (2) • Replication • Controlled duplication of data at more than one site • Update propagation?

Objectives (1) • Local autonomy • Local data locally owned and managed – minimal data requirements from remote sites. • No reliance on central site • Continuous operation • Reliability • Availability

Objectives (2) • Location independence • From user’s view, all data is at their site. • Fragmentation independence • Needs joins and unions to put fragments back together • Replication independence

Objectives (3) • Distributed query processing • Distributed transaction management • Transactions carried out by “agents” at distributed sites • Two-phase commit • Locking issues (later lecture)

Objectives (4) • Hardware independence • Operating system independence • Network independence • DBMS independence

DDS issues • Query processing • Optimisation even more important • Catalogue (data dictionary) management • Centralised? • Fully replicated? • Partitioned? • Combination of first and third?

DDS issues • Update propagation • An issue where replication is used. • “Primary copy” system • Recovery • Two-phase commit • Recovery • Locking strategies

Summary • Mapping the logical model • Denormalisation • Traditional database architecture • Client/server model • Distributed Database systems • Advantages • Objectives • Implementation methods • Issues

Further reading • Rolland chapter 10 • Hoffer chapters 12 • Denormalisation - click to follow the link!

Information Retrieval and Use