800 likes | 962 Views
Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview. Dr. Bhavani Thuraisingham. May 27, 2010. Objective of the Unit.
E N D
Information and Security Analytics Lecture #1 Unit #1: Data Management: Overview Dr. Bhavani Thuraisingham May 27, 2010
Objective of the Unit • This unit provides an overview of the developments in data management. It also provides an overview of data management, information management and knowledge management and illustrates a framework • Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997
Outline of the Unit • What is Data Management? • Developments in Data Management • Current Status and Trends • Note on Data Administration • Data management, Information management, and Knowledge Management
What is data management • One proposal: Data Management = Database System Management + Data Administration • Includes data analysis, data administration, database administration, auditing, data modeling, database system development, database application development • The tutorial will focus mainly on database system aspects of data management
Developments in Database Systems Network, Hierarchical database systems Relational database systems, transaction processing, distributed database systems Heterogeneous Next generation database database systems: integration, object-oriented, Migrating legacy deductive, - - - databases data warehousing, data mining, multimedia database systems, Internet database
Current Status Multimedia Database Database Systems Systems Limited integration between Sensor the different Data Database types of systems Warehousing Systems Systems Heterogeneous Data Database Mining Systems Systems Often Stovepiped by Technology
Some Outstanding Problems Integration Integration Migrating Multimedia Real-time Heterogeneous with other Legacy Database Database Database Technologies Applications Management Management Integration • Data • Quality of • Distributed • Semantic • Modernization model service processing heterogeneity • Enterprise • Index • Operating • Mass • Inferencing modeling strategies system storage • Transaction • Schema • Synchronization services • Information processing transformation • Data • Transaction management • Integrity manipulation processing • Knowledge • Security • Active management databases
Some Current Trends in Data Management • Heterogeneous database integration • Query, transactions, semantics, security and integrity • Migrating legacy databases • Fine-grained encapsulation, distributed objects • Multimedia databases • Query, model, quality-of-service, index • Data Warehousing • Building a warehouse, query • Data Mining • Multimedia databases, web data mining • Data management for collaboration • Architecture, transactions • Web databases and digital libraries • Query, transactions, index, security
Note on Data Administration • Identifying the data • Data may be in files, paper, databases, etc. • Analyzing the data • Is the data of good quality? • Is the data complete? • Data standardization • Should one standardize all the data elements and metadata? • Repositories for handling semantic heterogeneity? • Data modeling • Structure the data, model the data and the processes
Data, Information and Knowledge Management • Data Management • Data: stored in databases, files or some media • Data management includes modeling, storing, retrieving and anbalyzing the data • Information Management • Information is what is obtained by making sense out of the data; E.g., Data with context • Information management is about modeling, storing, retrieving and analyzing the information • Knowledge Management • Knowledge is what is obtained when the information is understood; it enables one to take actions • Knowledge management is about utilizing the knowledge to improve the business of an organization
Data, Information and Knowledge Management: Alternative View: MITRE Model 1999/2000 Decision Support Knowledge Management Information Management Data Management Communication, Network, Operating System, Middleware
Information and Security Analytics Lecture #1 Unit #2: Database Systems Dr. Bhavani Thuraisingham May 27, 2010
Objective of the Unit • This unit will provide an overview of the concepts and developments in database systems • Reference: Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997
Outline of the Unit • Concepts in database systems • Types of database systems
Concepts in Database Systems • Definition of a Database system • Early systems • Metadata • Architectural Issues • Schema, Functional • DBMS Design Issues • Other Issues • Database design, Administration
Database System • Consists of database, hardware, Database Management System (DBMS), and users • Database is the repository for persistent data • Hardware consists of secondary storage volumes, processors, and main memory • DBMS handles all users’ access to the database • Users include application programmers, end users, and the Database Administrator (DBA) • Need: Reduced redundancy, avoids inconsistency, ability to share data, enforce standards, apply security restrictions, maintain integrity, balance conflicting requirements • We have used the definition of a database management system given in C. J. Date’s Book (Addison Wesley, 1990)
An Example Database System Adapted from C. J. Date, Addison Wesley, 1990
Early systems: Hierarchical and Network Database Systems SUPPLIERS PARTS SUPPLIERS SUPPLIES SUPPLIES SUPPLIES PARTS Network Data Model Hierarchical Data Model
Metadata • Metadata describes the data in the database • Example: Database D consists of a relation EMP with attributes SS#, Name, and Salary • Metadatabase stores the metadata • Could be physically stored with the database • Metadatabase may also store constraints and administrative information • Metadata is also referred to as the schema or data dictionary
Three-level Schema Architecture: Details User B2 User A1 User A2 User A3 User B1 External Schema B External Model A External Schema A External Model B External/Conceptual Mapping A External/Conceptual Mapping B Conceptual Model Conceptual Schema Conceptual/Internal Mapping Stored Database Internal Model Internal Schema
Functional Architecture Data Management User Interface Manager Schema (Data Dictionary) Manager (metadata) Security/ Integrity Manager Query Manager Transaction Manager Storage Management File Manager Disk Manager
DBMS Design Issues • Query Processing • Optimization techniques • Transaction Management • Techniques for concurrency control and recovery • Metadata Management • Techniques for querying and updating the metadatabase • Security/Integrity Maintenance • Techniques for processing integrity constraints and enforcing access control rules • Storage management • Access methods and index strategies for efficient access to the database
Other Issues • Database design • Generally a two-step process • Semantic data model to capture the entities of the application and the relationships between the entities • Generate the conceptual schema; theory of normal forms for relational databases • Research on object-oriented approaches for database design • Database Administration • Creating and deleting databases; backup and recovery, enforcing policies, auditing, etc.
Types of Database Systems • Relational Database Systems • Object Database Systems • Deductive Database Systems • Other • Real-time, Secure, Parallel, Scientific, Temporal, Wireless, Functional, Entity-Relationship, Sensor/Stream Database Systems, etc.
Relational Database: Informal Overview • Collection of tables also called relations • Table has one or more columns also called attributes • Each table has zero or more rows also called tuples • Elements of a row take values from a pool of legal values • The values of one or more columns in a row uniquely identify the row. These columns form an identifier (also called key) • One identifier is designated as the unique identifier (also called primary key) • Querying relational databases using language called SQL (Structured Query Language)
Relational Database: Example Relation S: S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens Relation P: P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London P5 Cam Blue 12 Paris P6 Cog Red 19 London Relation SP: S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400
SQL: Data Manipulation • Select, Update, Delete, Insert Examples: SELECT S.S#, S.STATUS FROM S WHERE S.CITY = Paris SELECT * FROM S SELECT S.*, P.* FROM S, P WHERE S.CITY = P.CITY UPDATE P SET COLOR = ‘Yellow’ WEIGHT = WEIGHT + 5 CITY = NULL WHERE P# = P2
Features of Object-Oriented Database Systems Suitable for Advanced Applications • Objects (support for large and variable sized data blocks) • Class hierarchy (reusability) • Instance variables, composite and complex objects (complex data structures) • Methods, and message passing (object encapsulation) • Pointer swizzling (performance) • Tighter integration with programming languages (application program support) • Special mechanisms for long transactions and concurrency control, multimedia information management, schema management, versions management, storage management
Concepts in Object Database Systems • Objects- every entity is an object • Example: Book, Film, Employee, Car • Class • Objects with common attributes are grouped into a class • Attributes or Instance Variables • Properties of an object class inherited by the object instances • Class Hierarchy • Parent-Child class hierarchy • Composite objects • Book object with paragraphs, sections etc. • Methods • Functions associated with a class
D1 D2 J1 Example Class Hierarchy ID Name Author Publisher Document Class Method2: Method1: Print-doc(ID) Print-doc-att(ID) Journal Subclass Book Subclass Volume # # of Chapters B1
Example Composite Object Composite Document Object Section 2 Object Section 1 Object Paragraph 1 Object Paragraph 2 Object
Deductive Database Systems • Database systems augmented with inference engines to deduce new data from existing data and rules • Example • Rule: parent of a parent is a grandparent • Data: John is Jane’s parent; Jane is Robert’s parent • From the above, infer John is Robert’s grandparent • Loose and tight coupling architectures between the database system and inference engine
Current Status • Database Systems is a mature technology; numerous products and prototypes • Much work followed in distributed and heterogeneous databases • Current directions include web database management as well as data management support for novel applications including E-commerce, Bioinformatics and Geoinformatics • Work still continues on developing new kinds of database systems including stream/sensor database systems
Information and Security Analytics Lecture #1 Unit #3: Distributed and Heterogeneous Database Systems Dr. Bhavani Thuraisingham May 27, 2010
Objective of the Unit • This unit provides an overview of concepts in distributed and heterogeneous databases. In particular, definitions and functions, are discussed • Reference: • Data Management Systems: Evolution and Interoperation, Thuraisingham, CRC Press, 1997 • Heterogeneous Information Exchange and Organizational Hubs, Kluwer, 2002, Editors: Bestougeff, Dubois and Thuraisingham
Outline of the Unit • Distributed Database Systems • Architecture, Data Distribution, Functions • Heterogeneous Database Integration • Federated Database Management • Client-Server Database Management • Migrating Legacy Databases • Current Status and Directions
A Definition of a Distributed Database System • A collection of database systems connected via a network • The software that is responsible for interconnection is a Distributed Database Management System (DDBMS) • Each DBMS executes local applications and should be involved in at least one global application (Ceri and Pelagetti) • Homogeneous environment
Data- base 1 DBMS 3 Data- base 3 Distributed Processor 3 Site 3 DBMS 1 Distributed Processor 1 Communication Network Site 1 Distributed Processor 2 Data- base 2 DBMS 2 Site 2 Architecture
Distributed Processor Network Interface Distributed Query/Update Processor Distributed Transaction Manager Integrity/ Security Manager Distributed Metadata Management Local DBMS Interface
Data Distribution S I T E 1 E M P 1 D E P T 1 D # S S # N a m e S a l a r y D # D n a m e M G R 1 0 1 J o h n 2 0 1 0 C . S c i . J a n e 2 0 2 P a u l 3 0 2 0 3 J a m e s 4 0 3 0 E n g l i s h D a v i d 2 0 4 J i l l 5 0 4 0 F r e n c h P e t e r 1 0 6 0 5 M a r y 2 0 6 J a n e 7 0 S I T E 2 E M P 2 D E P T 2 S S # N a m e S a l a r y D # D n a m e D # M G R 9 M a t h e w 7 0 5 0 5 0 J o h n M a t h 7 D a v i d 8 0 3 0 P h y s i c s P a u l 2 0 8 P e t e r 9 0 4 0
Distributed Database Functions • Distributed Query Processing • Optimization techniques across the databases • Distributed Transaction Management • Techniques for distributed concurrency control and recovery • Distributed Metadata Management • Techniques for managing the distributed metadata • Distributed Security/Integrity Maintenance • Techniques for processing integrity constraints and enforcing access control rules across the databases
Query Processing Example (Concluded) DQP (Distributed Query Processor) Network DQP DQP DQP DBMS 3 DBMS 1 DBMS 2 EMP1 (20) EMP3 (50) DEPT3 (30) EMP2 (30) DEPT2 (20) EMP1 (20) Query at site 1: Join EMP and DEPT on D# Move EMP2 to site 3; Merge EMP1, EMP2, EMP3 to form EMP Move DEPT2 to site 3; Merge DEPT2 and DEPT3 to form DEPT Join EMP and DEPT; Move result to site 1
Transaction Processing Example DTM (Distributed Transaction Manager) responsible for executing the distributed transaction Issues: Concurrency control Recovery Data Replication Site 1 Coordinator Transaction Tj Subtransaction Tj4 Subtransaction Tj2 Subtransaction Tj3 Site 2 Participant Site 4 Participant Site 3 Participant Two-phase commit: Coordinator queries participants whether they are ready to commit If all participants agree, then coordinator sends request for the participants to commit
Interoperability of Heterogeneous Database Systems Database System A Database System B (Relational) (Object- Oriented) Network Transparent access to heterogeneous databases - both users and application programs; Query, Transaction processing Database System C (Legacy)
Technical Issues on the Interoperability of Heterogeneous Database Systems • Heterogeneity with respect to data models, schema, query processing, query languages, transaction management, semantics, integrity, and security policies • Interoperability based on client-server architectures • Federated database management • Collection of cooperating, autonomous, and possibly heterogeneous component database systems, each belonging to one or more federations
Different Data Models Network Node A Node B Node C Node D Database Database Database Database Network Model Object- Oriented Model Relational Model Hierarchical Model Developments: Tools for interoperability; commercial products Challenges: Global data model
Schema Integration and Transformation: An approach External Schema III External Schema I External Schema II Global Schema: Integrate the generic schemas Generic schema describing the relational database Generic schema describing the network database Generic schema describing the hierarchical database Generic schema describing the object-oriented database Schema describing the network database Schema describing the relational database Schema describing the hierarchical database Schema describing the object-oriented database Challenges: Selecting appropriate generic representation; maintaining consistency during transformations; schema evolution
Semantic Heterogeneity • Semantic heterogeneity occurs when there is a disagreement about the meaning or interpretation of the same data Object O Challenges: Standard definitions; Repositories Node A Node B Database Database Object O interpreted as a passenger ship Object O interpreted as a submarine