370 likes | 449 Views
Network, Object-Oriented and Other Database Models. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Review. OLAP Data Mining. OLAP. Online Line Analytical Processing Intended to provide multidimensional views of the data
E N D
Network, Object-Oriented and Other Database Models University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management Database Management -- R. Larson
Review • OLAP • Data Mining Database Management -- R. Larson
OLAP • Online Line Analytical Processing • Intended to provide multidimensional views of the data • I.e., the “Data Cube” • The PivotTables in MS Excel are examples of OLAP tools Database Management -- R. Larson
Data Cube Database Management -- R. Larson
Operations on Data Cubes • Slicing the cube • Extracts a 2d table from the multidimensional data cube • Example… • Drill-Down • Analyzing a given set of data at a finer level of detail Database Management -- R. Larson
Data Mining • Data mining is knowledge discovery rather than question answering • May have no pre-formulated questions • Derived from • Traditional Statistics • Artificial intelligence • Computer graphics (visualization) Database Management -- R. Larson
Goals of Data Mining • Explanatory • Explain some observed event or situation • Why have the sales of SUVs increased in California but not in Oregon? • Confirmatory • To confirm a hypothesis • Whether 2-income families are more likely to buy family medical coverage • Exploratory • To analyze data for new or unexpected relationships • What spending patterns seem to indicate credit card fraud? Database Management -- R. Larson
Data Mining Applications • Profiling Populations • Analysis of business trends • Target marketing • Usage Analysis • Campaign effectiveness • Product affinity Database Management -- R. Larson
Data Mining Algorithms • Market Basket Analysis • Memory-based reasoning • Cluster detection • Link analysis • Decision trees and rule induction algorithms • Neural Networks • Genetic algorithms Database Management -- R. Larson
Hierarchical Database Systems Network Database Systems Object-Oriented Database Systems Inverted File and Flat File DBMS Today Database Management -- R. Larson
Hierarchical Model • Data items are structured in a Parent-Child hierarchical relationship • Data items are grouped into “logical record types”. Each of these approximately corresponds to a table in the relational model. • FOCUS permits “virtual segments” that are stored as files. Database Management -- R. Larson
pubid accno Has call Has copy BIBFILE CALLFILE LIBFILE accno Libid Callno libid Library publishes PUBFILE pubid Address, etc Has index Has subject INDXFILE SUBFILE accno subid subid subject Cookie ER Diagram Database Management -- R. Larson
BIBINFO PUBINFO INDXINFO CALLINFO SUBINFO LIBINFO Hierarchical Model for Cookie Database Management -- R. Larson
FOCUS BIBFILE Definition FILENAME = BIBFILE, SUFFIX = FOC, $ SEGNAME = BIBINFO, SEGTYPE = S, $ FIELD = ACCESSION NO, ALIAS = ACCNO, USAGE = A4, $ FIELD = AUTHOR, ALIAS = AU, USAGE = A30, $ FIELD = TITLE, ALIAS = TI, USAGE = A40, $ FIELD = LOCATION, ALIAS = LOC, USAGE = A20, $ FIELD = PUBLISHERID, ALIAS = PUBID, USAGE = A2, $ FIELD = DATE, ALIAS = D, USAGE = A4, $ FIELD = PRICE, ALIAS = PR, USAGE = D6.2, $ FIELD = PAGINATION, ALIAS = PAGIN, USAGE = A13, $ FIELD = ILLUSTRATION, ALIAS = ILL, USAGE = A9, $ FIELD = HEIGHT, ALIAS = HT, USAGE = I2, $ SEGNAME = PUBINFO, PARENT = BIBINFO, SEGTYPE = KU, CRFILE = PUBFILE, CRKEY = PUBLISHERID, $ SEGNAME = INDXINFO, PARENT = BIBINFO, SEGTYPE = S, $ FIELD = SUBID, ALIAS = SID, USAGE = A2, $ SEGNAME = CALLINFO, PARENT = BIBINFO, SEGTYPE = S, $ FIELD = LIBRARYID, ALIAS = LIBID, USAGE = A2, $ FIELD = CALL NUMBER, ALIAS = CALLNO, USAGE = A15, $ FIELD = COPIES, ALIAS = C, USAGE = I4, $ SEGNAME = LIBINFO, PARENT = CALLINFO, SEGTYPE = KU, CRFILE = LIBFILE, CRKEY = LIBRARYID, $ SEGNAME = SUBINFO, PARENT = INDXINFO, SEGTYPE = KU, CRFILE = SUBFILE, CRKEY = SUBID, $ Database Management -- R. Larson
PUBFILE Segment FILENAME = PUBFILE, SUFFIX = FOC, $ SEGNAME = PUBINFO, SEGTYPE = S, $ FIELD = PUBLISHERID, ALIAS = PUBID, USAGE = A2, FIELDTYPE = I, $ FIELD = PUBLISHER, ALIAS = PNAME, USAGE = A30, $ FIELD = PUB ADDRESS, ALIAS = PADDRESS, USAGE = A20, $ FIELD = PUB CITY, ALIAS = PCITY, USAGE = A15, $ FIELD = PUB STATE, ALIAS = PSTATE, USAGE = A2, $ FIELD = PUB ZIP, ALIAS = PZIP, USAGE = A5, $ FIELD = PUB PHONE, ALIAS = PPHONE, USAGE = A10, $ FIELD = PUB SHIP, ALIAS = SHIP, USAGE = I3,$ Database Management -- R. Larson
SUBFILE Segment FILENAME = SUBFILE, SUFFIX = FOC, $ SEGNAME = SUBINFO, SEGTYPE = S, $ FIELD = SUBID, ALIAS = SID, USAGE = A2, FIELDTYPE = I, $ FIELD = SUBJECT, ALIAS = SUB, USAGE = A32, $ Database Management -- R. Larson
LIBFILE Segment FILENAME = LIBFILE, SUFFIX = FOC, $ SEGNAME = LIBINFO, SEGTYPE = S, $ FIELD = LIBRARYID, ALIAS = LIBID, USAGE = A2, FIELDTYPE = I, $ FIELD = LIBRARY, ALIAS = LIB, USAGE = A42, $ FIELD = LIB ADDRESS, ALIAS = LADDRESS, USAGE = A25, $ FIELD = LIB CITY, ALIAS =LCITY, USAGE = A15, $ FIELD = LIB STATE, ALIAS = LSTATE, USAGE = A2, $ FIELD = LIB ZIP, ALIAS = LZIP, USAGE = A5, $ FIELD = LIB PHONE, ALIAS = LPHONE, USAGE = A10, $ FIELD = MONOPEN, ALIAS = MOP, USAGE = I4, $ FIELD = MONCLOSE, ALIAS = MCL, USAGE = I4, $ FIELD = TUEOPEN, ALIAS = TUOP, USAGE = I4, $ FIELD = TUECLOSE, ALIAS = TUCL, USAGE = I4, $ FIELD = WEDOPEN, ALIAS = WOP, USAGE = I4, $ FIELD = WEDCLOSE, ALIAS = WCL, USAGE = I4, $ FIELD = THURSOPEN, ALIAS = THOP, USAGE = I4, $ FIELD = THURSCLOSE, ALIAS = THCL, USAGE = I4, $ FIELD = FRIOPEN, ALIAS = FOP, USAGE = I4, $ FIELD = FRICLOSE, ALIAS = FCL, USAGE = I4, $ FIELD = SATOPEN, ALIAS = SATOP, USAGE = I4, $ FIELD = SATCLOSE, ALIAS = SATCL, USAGE = I4, $ FIELD = SUNOPEN, ALIAS = SUNOP, USAGE = I4, $ FIELD = SUNCLOSE, ALIAS = SUNCL, USAGE = I4, $ Database Management -- R. Larson
Hierarchic Querying • All searches must proceed from the “root” of the hierarchy, and traverse each segment containing required information Database Management -- R. Larson
FOCUS Query TABLE FILE BIBFILE PRINT AU OVER CALLNO AND TI AND LIB IF AU CONTAINS FLEXNER IF LIB CONTAINS MOFFITT END > NUMBER OF RECORDS IN TABLE= 1 LINES= 1 AUTHOR FLEXNER, ABRAHAM CALL NUMBER 370.65 TITLE UNIVERSITIES: AMERICAN, ENGLISH, GERMAN LIBRARY MOFFITT LIBRARY Database Management -- R. Larson
BIBINFO PUBINFO INDXINFO CALLINFO SUBINFO LIBINFO Hierarchical Query Processing Database Management -- R. Larson
Network Database Systems • Network DBMS are an evolutionary step from Hierarchical systems. • Hierarchical systems can be considered a subset of Network systems. Database Management -- R. Larson
History • Specifications for network systems came from CODASYL (Conference on Data and Systems Languages) -- The same fine folks who brought you COBOL. • The DTBG (Data Base Task Group) was founded in 1965 to specify a “standard language for manipulating records” • The result was a report (published in 1971) Database Management -- R. Larson
History • The CODASYL DBTG report contained specifications for: • A DDL - Data Definition Language • A DML - Data Manipulation Language • Inherent in the report was the underlying Network database structure. Database Management -- R. Larson
Components of DDL and DML • DDL is used to describe or define: • database records • individual data items • the associations that exist between record types • security • record positioning • The database definition created by the DDL is called the database schema • User views can also be defined in the DDL and are called subschemas Database Management -- R. Larson
DDL Continued • DDL is also used by the database designer to define all associations between record types • These associations are called Sets and are sometimes referred to as DTBG Sets. • Sets describe a one to many relationship between two distinct record types. • The record on the “one” side of the set is called the “owner” • The record on the “many” side of the set is called the “member” Database Management -- R. Larson
DDL Definitions • Example DDL for a DB (partial) RECORD NAME IS INVOICE; LOCATION MODE IS VIA CUSTOMER-INVOICE SET WITHIN ORDERENTRY; 02 INVOICE-ID PICTURE IS X(5). 02 INVOICE-DATE PICTURE IS 9(6). 02 INVOICE-AMOUNT TYPE IS BINARY. Etc... SCHEMA NAME IS SAMPLEDB. AREA NAME IS ORDERS. RECORD NAME IS CUSTOMER; LOCATION MODE IS CALC USING CUSTOMERID DUPLICATES ARE NOT ALLOWED; WITHIN ORDERENTRY; 02 CUSTOMERID PICTURE IS X(5). 02 CUSTOMER-NAME PICTURE IS X(30). 02 CUSTOMER-ADDRESS. 05 STREET PICTURE IS X(25). 05 CITY PICTURE IS X(15). 05 STATE PICTURE IS XX. 05 ZIPCODE PICTURE IS X(10). 02 CUSTOMER-TELEPHONE PICTURE IS X(13). Etc…. SET NAME IS CUSTOMER-INVOICE; OWNER IS CUSTOMER INSERTION IS FIRST MEMBER IS INVOICE MANDATORY AUTOMATIC LINKED TO OWNER SET SELECTION IS THRU CUSTOMER-INVOICE CURRENT OF SET. Database Management -- R. Larson
Set Definition and Pointers CUSTOMER INVOICE INVOICE INVOICE INVOICE Database Management -- R. Larson
Set Definitions and Pointers INVOICE Lst Fst LINE-ITEM O N P LINE-ITEM O N P LINE-ITEM O N P LINE-ITEM O N P LINE-ITEM O N P Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • Each real-world entity is modeled by an object. Each object is associated with a unique identifier (sometimes call the object ID or OID) Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • Each object has a set of instance attributes (or instance variables) and methods. • The value of an attribute can be an object or set of objects. Thus complex object can be constructed from aggregations of other objects. • The set of attributes of the object and the set of methods represent the object structure and behavior, respectively Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • The attribute values of an object represent the object’s status. • Status is accessed or modified by sending messages to the object to invoke the corresponding methods Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • Objects sharing the same structure and behavior are grouped into classes. • A class represents a template for a set of similar objects. • Each object is an instance of some class. Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • A class can be defined as a specialization of of one or more classes. • A class defined as a specialization is called a subclass and inherits attributes and methods from its superclass(es). Database Management -- R. Larson
Object-Oriented DBMSBasic Concepts • An OODBMS is a DBMS that directly supports a model based on the object-oriented paradigm. • Like any DBMS it must provide persistent storage for objects and their descriptions (schema). • The system must also provide a language for schema definition and and for manipulation of objects and their schema • It will usually include a query language, indexing capabilities, etc. Database Management -- R. Larson
employee Employee No Name Address Date hired Date of Birth calculateAge Hourly Salaried consultant Hourly Rate Annual Salary Stock Option Contract No. Date Hired calculateWage calculateStockBenefit AllocateToContract Generalization Hierarchy Database Management -- R. Larson
Inverted File DBMS • Usually similar to Hierarchic DBMS in record structure • Support for repeating groups of fields and multiple value fields • All access is via inverted file indexes to DBS specified fields. • Examples: ADABAS DBMS from Software AG -- used in the MELVYL system Database Management -- R. Larson
Flat File DBMS • Data is stored as a simple file of records. • Records usually have a simple structure • May support indexing of fields in the records. • May also support scanning of the data • No mechanisms for relating data between files. • Usually easy to use and simple to set up Database Management -- R. Larson