Distributed DBMSs

Distributed DBMSs • A distributed database is a single logical database that is physically distributed to computers on a network. • Homogeneous DDBMS has the same local DBMS at each site. • Heterogeneous DDBMS has at least two sites where the local DBMSs are different.

Characteristics ofDistributed DBMSs • Location transparency feels to a user as though the entire database is at their location. • Replication transparency is where the user is unaware of the behind the scenes replication of the data. • Fragmentation transparency is where a local object can be divided among the various locations on the network.

Advantages of Distributed Databases • Local control of data • Increasing database capacity • System availability • Added efficiency

Disadvantages of Distributed Databases • Update of replicated data • More complex query processing • More complex treatment of shared update • More complex recovery measure • More difficult management of data dictionary • More complex data design

File Servers • File server contained files required by the individual workstations on the network.

Client/Server Systems • Client/Server has the DBMS run on the file server, but the user sends requests for specific data, not files.

Advantages ofClient/Server Systems • More efficient than file server systems. • Possibility of distributing work among several processors. • Workstations need not be as powerful. • The user doesn’t need to learn any special commands or techniques.

Advantages ofClient/Server Systems • Easier for users to access data from a variety of sources. • Provides greater level of security than file server systems. • Powerful enough to replace expensive mainframe applications.

Data Warehouses • A subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process.

Data Warehouse Architecture

Data Warehouse Structure

Why build a Data Warehouse? • To speed up the writing and maintaining of queries and reports by technical personnel • To more easily query and report data, on a regular basis, from multiple transaction processing systems and/or from external data sources • To provide a repository of transaction processing system data that contains data over a span of time

Why build a Data Warehouse • To address security concerns • To provide a repository of "cleaned up" transaction processing systems data that can be reported against and that does not necessarily require fixing the transaction processing systems

Data errors • Incomplete • Missing records/fields • Incorrect • Wrong codes (or incorrect pairing of codes) • Incomprehensible • Multiple fields in one field • Many to many relationships • Spreadsheet and word-processing files

Data Errors • Inconsistent • Use / meaning of codes • Business rules • Timing • Use of attributes • Use of nulls/spaces

Data Mining • Identify the goal • Assemble the relevant data • Choose your analysis methods • Decide which software tool is best for implementing the method • Run the analysis • Decide how to implement the results

Operational Database organized about a transaction supports OLTP (record keeping) thousands of users accesses few records at a time response time in seconds Data Warehouses organized about a subject supports OLAP (decision support) few hundred users accesses many records at a time response times in minutes Organizational Databases

Operational Database primitive & detailed smaller (current) highly normalized (many tables with few columns) dynamic (continuous updates online) Data Warehouses derived & summarized larger (historical) de-normalized (few tables with many columns) periodic (batch update) Organizational Databases

Distributed DBMSs