Section 20.1 Modes of Information Integration

Section 20.1 Modes of Information Integration Anilkumar Panicker CS257: Database SystemsID: 118

Agenda • Information Integration • Problems of Information Integration 1. Data type differences 2. Value differences 3. Semantic Differences 4. Missing Values • Modes of Information Integration 1. Federated Databases 2. Warehousing 3. Mediation

Information Integration • In simple terms “Information Integration” can be defined as the process of taking data that is stored in two or more databases (Information Sources) and building one large database from them. • The sources may be conventional databases or collection of web pages.

Motivation • The main motivation behind Information Integration is to bring Information from Disparate sources, possible from sources with varied schemas so that data can be queried as a single unit. • End user is oblivious to the contextual and typographical differences in the sources of data.

Example • Consider the example of an automobile company with 1000 dealers. • Each dealer maintains a database of their cars in stock. • The company wants to create an integrated database which contains information from all the 1000 dealers. • The integrated database can be useful in performing market analysis and determining market trends to adjust production.

Example cont.. • However, the 1000 dealers do not all use the same database schema. • Dealer 1 may store information in a single relation. • While dealer 2 might use a schema where options are separated into a second relation.

Example Cont.. • Dealer 1 Cars(sNo, model, color, aTran, cdPlr, ….) • Dealer 2 Autos(serial, model, color) Options(serial, option) • Not only is the schema different, but equivalent names have also changed

Problems of Information Integration • Data Type differences • Serial number might be represented by character string at one source, while other source might use integers. • Even if two sources use character string to store serial number, one may use fixed length while other may use variable length string.

Problems of Information Integration • Value Differences • The same concept might be represented by different constants at different sources • For e.g. The color black might be represented by code BL at one source, while BL may stand for color blue at other source.

Problems of Information Integration • Semantic Differences • Terms may be given different interpretations at different source. • One dealer might include trucks in Cars relation while other dealer includes only car information in Cars relation.

Problems of Information Integration • Missing Values • A source might not record information of a type that all or most of the other sources provide. • For e.g. a dealer might not record colors at all.

Federated Database Systems • Simplest method of Information Integration • One to One connection between all pairs of databases that need to talk to one another. • These connections allow one database system D1 to query another D2 in terms that D2 can understand.

A federated collection of four databases

Considering the schemas from slide 7,this is how Dealer 1 queries Dealer 2 for needed cars

Problem with federated database Systems • Large number of pieces of code must be written to provide communication between databases that need to talk. • If n databases want to talk with each other, then the total number of connections required is n (n-1)

Data Warehouses • Data from several sources is extracted and combined into a global schema. • The Integrated data is then stored at warehouse, which looks like an ordinary database to the user. • User can issue queries to warehouse exactly in the same manner in which he would to an ordinary database. • Updates to the warehouse are generally forbidden.

Data warehouse • Data warehouse stores integrated information in a separate database.

Three approaches to constructing the data in the Warehouse • Periodic reconstruction • Incremental Updates • Warehouse changed immediately in response to changes at the source.

Data warehouse • Periodic reconstruction • Most common approach. • Warehouse reconstructed periodically from the current data at source. • Disadvantages • Warehouse must be shut down for reconstruction • Data in the warehouse can become seriously outdated.

Data warehouse • Incremental Updates • Warehouse updated periodically based on the changes that have been made to source. • Involves smaller amounts of data. • Useful in situations where warehouse needs to be updated in a short period of time and the warehouse is large. • Disadvantage • Complex compared to periodic reconstruction.

Data warehouse • Warehouse changed immediately in response to changes at the source. • Involves too much communication and processing. • Difficult to implement for large warehouses. • Practical only in situations where the underlying source changes very slowly.

Extractor for Data warehouse • Considering the schemas on slide 7 this is how we can extract data from dealer 1

Mediators • Mediator supports a virtual view or collection of views. • Integrates several sources in the same way as a Warehouse does. • The major difference is that Mediator does not store any data.

Mediators • A mediator translates queries into the terms of the source and combines the answers

Mediators • No separate combiner is required as Mediator performs the function of combining the results. • Mediator is also responsible for determining where the queries should be directed.

Section 20.1 Modes of Information Integration

Section 20.1 Modes of Information Integration

Presentation Transcript

Information Integration

Information Integration

Information Integration

Section 20.1 Essential Elements of Advertising Section 20.2 Advertising Layout

Section 20.1

Section 20.1 Elements of Advertising

INFORMATION INTEGRATION

Information Integration

Information Integration

20.1 Section Objectives – page 529

Chapter 21.2 Modes of Information Integration

Information Integration

Information Integration

INFORMATION INTEGRATION

20.1 Section Objectives – page 529

Section 20.1

Foundations of Information Integration

20.1

20.1

Information Integration

INFORMATION INTEGRATION

Information Integration