590 likes | 723 Views
Information Resources Management. April 24, 2001. Agenda. Administrivia Object-Oriented & Databases Data Warehousing Data Mining SQL Extensions XML. Administrivia. Homework #8 Homework #9 Current Scores Final Review Session?. OODBMS vs. ORDBMS. OODBMS - Object-Oriented
E N D
Information Resources Management April 24, 2001
Agenda • Administrivia • Object-Oriented & Databases • Data Warehousing • Data Mining • SQL Extensions • XML
Administrivia • Homework #8 • Homework #9 • Current Scores • Final Review Session?
OODBMS vs. ORDBMS • OODBMS - Object-Oriented • ORDBMS - Object-Relational • Appendix A
OODBMS • Persistent Objects • By class • By creation • By marking • By reference • Storage/Retrieval Methods
OODBMS - Benefits • Match • Programming • Methodology • Data types & structures • Ease of programming • Inheritance
OODBMS - Challenges • Standards • ODMG - Object Database Management Group • Performance • Database vs. persistent language • Loss of integrity, queries • Storage Space • Maturity
ORDBMS • Extensions to relational model • Complex data types • Inheritance • References • Migration path • Use existing applications and knowledge base
ORDBMS - Benefits • SQL • Existing Systems • Vendors
ORDBMS - Challenges • Standards • “Fit” with the development language • Programming Complexity
Using a relational database to store data from an object-oriented system has been likened to parking your car in your garage. With an OODBMS you park the car in the garage. If a (O)RDBMS is used, to park your car in the garage, you must first completely disassemble it and put each part in its specific location on a shelf. This process must then be reversed the next time you want to go for a drive.
Other Links • Object Database Management Group www.odmg.org • Object Database Newsgroup comp.databases.object
Data Mining • Corporations have collosal amounts of data • Usually only used for very specific purposes (operations) • Automated attempt to learn from the data • Find statistical rules and patterns in the data Example: Giant Eagle Advantage Card
Goals of Data Mining • Explanatory - Why? • Confirmatory - Is it? • Exploratory - ???
Classification identify rules that create groups Association find related conditions or events Correlation relationships between values User Guided hypothesis driven Automatic data driven - AI based Approaches to Data Mining
Data Warehouse • A subject-oriented, integrated, time-variant, nonvolatile collection of data • Usually all data for a corporation • Multidimensional database
Data Warehousing • Single location • Long-term storage • Greater availability • Separate “data” processing from day-to-day operations (performance) • All data is historical • Support data mining, et al.
Data Warehousing Questions • What data needs to be kept? • Where is it from? • How good is it? • How long should it be kept? • Can it be summarized? When? • Will it make sense? What is the schema? • When is it updated?
Data Warehousing - Benefits • Support for decision making tools • DSS, EIS, Data Mining • Separation of information and day-to-day processing • Unification - Centralization • Improved quality and consistency
Data Warehousing - Challenges • Costs: Storage, Setup, Maintenance • Historical data issues • Defining the warehouse schema • Doing the conversion • Implementation & every time • Keeping up with operational system changes • Answering the questions
Multidimensional Databases • Two views • Multidimensional tables • Star schema • Multidimensional table • each cell is attribute • dimensions are “interesting” categories
Multidimensional Table • Cell - sales • Dimensions • day • person • store • item
Star Schema • Multiple tables • Central table - data item (cell) • Surrounding tables - information about each category (dimensions)
Star Schema Person Day Sales Store Item
Star Schema Sales (Day, Person, Store, Item, sales) Day (Day, day info) Person (Person, person info) Store (Store, store info) Item (Item, item info)
Building/Maintaining a Data Warehouse 1. Capture 2. Scrub 3. Transform 4. Load and Index
Data Marts • Making specific data available • Different ones for different needs DM1 DW Operational Systems DM2
Data Mining • Corporations have collosal amounts of data • Usually only used for very specific purposes (operations) • Automated attempt to learn from the data • Find statistical rules and patterns in the data Example: Giant Eagle Advantage Card
Goals of Data Mining • Explanatory - Why? • Confirmatory - Is it? • Exploratory - ???
Classification identify rules that create groups Association find related conditions or events Correlation relationships between values User Guided hypothesis driven Automatic data driven - AI based Approaches to Data Mining
Data Mining - Benefits • Use data • Learn new things • Improve decision making
Data Mining - Challenges • Time (human and/or computer) • Spurious results • Separating the wheat from the chaff • Availability of data • Amount of data • Changes in tools and technologies • Validity over time
Enhanced Data Analysis • Beyond SUM, COUNT, and AVG • SQL extensions (suggested) • GROUP BY … AS PERCENTILE • Specific percentiles • GROUP BY … WITH CUBE • Cross-tabulations • Statistical package interface • SAS, S++, others
Enhanced Data Analysis - Benefits • Greater functionality • Improved decision making
Enhanced Data Analysis - Challenges • Lack of standards • Understandability • Processing requirements • Cost of poorly written queries • “ad hoc” queries aren’t reviewed
Extending Relational DBs • Spatial and Geographic Databases • Multimedia Databases • Changing the data stored while retaining the benefits of relational databases
Spatial & Geographic DBs • Spatial - CAD • Geographic - GIS • Similar issue • How to store and retrieve such data
Spatial Databases • Geometric objects (2 or 3 dimensions) • Locations • Connections • Nonspatial information about each object • Substructures • Spatial integrity constraints • Two things can’t occupy the same space
GIS Databases • Raster Data (fractal data) • Pictures - possibly over time • Maps • Vector Data • Locations • Connections • Nongeographic information
Spatial & Geographic DB -Benefits • DBMS • Specialized queries • Spatial & Geographic Data • “Standard” Data • Mix of the two • Integrity constraints
Spatial & Geographic DB - Challenges • Space requirements • Level of detail • Understandability - Complexity • Processing requirements • Compatibility between systems • Lack of standards
Multimedia Databases • Images, Audio, Video • Nonmultimedia data (text) about each • Database Enhancements • BLOBs (Binary Large Objects) • Similarity-based queries • Guaranteed steady rate • Synchronization of audio and video
Multimedia Databases - Benefits • DBMS • Greater compression may be possible • “Paperless” office - document imaging • Workflow redesign - improvements • Greater availability
Multimedia Databases - Challenges • S T O R A G E • Specialized DBMS • Unity of database and network • Usually requires ATM • Specialized hardware • “juke boxes” • optical disks
XML • What is it? • What isn’t it? • What are the goals? • Who controls it? • Who’s using it? • Beyond XML
What is XML? • eXtensible Markup Language • Markup language for “structured information” • “structured” - content & role of that content • markup - identify structures • “meta language for describing markup languages”
Huh? • Storing structured data in a text file • spreadsheet, address book, transactions (think EDI) • Looks like HTML, <tags>, but isn’t • Text is universal, but not efficient • Does disk space matter? • What about network capacity? • XML is license-free & platform-independent
What XML isn’t • HTML • SGML - Standard Generalized Markup Language - printing • Limited to current definitions (tags) • XML is the way to add new definitions • A relational database management system • A database, or is it?