790 likes | 928 Views
Advanced Topics selected slides from textbook and some extra slides. Knowledge and Information in Databases Distinction Deduction Sources of Knowledge Data mining Limits of Knowledge Data Warehouse. Object Oriented Databases The components of an Object Inheritance Encapsulation
E N D
Advanced Topicsselected slides from textbook and some extra slides Knowledge and Information in Databases Distinction Deduction Sources of Knowledge Data mining Limits of Knowledge Data Warehouse Object Oriented Databases • The components of an Object • Inheritance • Encapsulation • Methods • Persistence • Classes, Constructors and Interfaces • Multimedia
Modeling Reality Database Enterprise Num Fitting Cost brass hinge tap Day Num Status Mon Tue Wed Wrong A good model generates a lasting design Relations and information flows Truth is the conformity that exists between the thing (reality of the enterprise) and the description of it (database) Saint Thomas Aquinas (1224-1275)
IT conception of Knowledge Austin the capital of Texas ¿Is this Information or knowledge? • Information Equalityp = cp if and only if SI c • Austin = Capitalof Texas A fact, true data (previously corroborated) o given as a postulate Knowledge example:Which is the best restaurant in Austin? p => r (p IMPLIES r) judgment if City = Austinthen the best restaurant is An implication or a sound judgment (justified belief) Product of experience or from a committed selection with the available information A workinghypothesis to be corroborated by the scientific method
IT conception of Knowledge Truth is the conformity that exists between the thing (reality of the enterprise) and the description of it (database) Saint Thomas Aquinas (1224-1275) “Knowledgeis experience, the rest isinformation”Albert Einstein (1879-1955)
On Demand Inexpensive Uncommitted Contingently True Acquirable Objective Facts Proven Truth Socially Accepted Hard to find Willing to pay for it Highly committed Justified Belief Subject to selection Subjective Product of experience Must be validated IT conception of Knowledge Information Knowledge
The Scientific Method: The continuous cycle of Science information knowledge information Experimentation (more data & observation) knowledge feedback information R E A L I T Y data From data (facts) to information(initial truth) to knowledge(discovery) to information(truth) to knowledge(discovery) to information(truth) . . . With new hypotheses p, p =>q (knowledge) Opportunities are generated (innovation) Feedbackis continuously testing our observation & introspection
Prolog and deductive knowledge in databases Deductive rules (simple deductive knowledge) superior(x,y) <= supervise(x,y). superior(x,y) <= supervise(x,z), supervise(z,y). subordinate(x,y) <= superior(y,x). Facts (truth) supervise(Franklin, John). supervise(Franklin, Ronald). supervise(Franklin, Tammy). … supervise(James, Franklin). … supervise(James, Jennifer). supervise(James, Franklin). supervise(Jennifer, Thomas). Queries Who is the superior of Jennifer? execute superior(x, Jennifer) is James the superior of John? [yes, no] execute superior(James, John) Note that in this case because of the full deduction knowledge becomes truth immediately
Using SQL with deductive rules Facts (truth) {all rows} parts(pn,pname,pcolor,pweight, pcity). suppliers (sn,sname, sstatus,scity). sp(pn,sn, Qty). Deductive rules (changing knowledge) red_ parts(x) <= parts(x,,’red’,,,). same_city (x,s) <= parts(x,,’red’,,z),sp(x,y,>0),s(y,,,z).
How is knowledge stored anyway? Facts (truth) {all rows} parts(pn,pname,pcolor,pweight, pcity). suppliers (sn,sname, sstatus,scity). sp(pn,sn, Qty). Knowledge is stored as a rule, yet it looks like a table however the operator is not = but <= red_ parts(x) <= parts(x,,’red’,,,).
Market Basket Example ? Where should detergents be placed in the Store to maximize their sales? ? Are window cleaning products purchased when detergents and orange juice are bought together? ? Is soda typically purchased with bananas? Does the brand of soda make a difference? ? How are the demographics of the neighborhood affecting what customers are buying? Source: Peter Bajcsy http://algdocs.ncsa.uiuc.edu/PR-20021116-1.ppt
Inductive knowledge [aka Data mining] Refers to discovery of new information in terms of patterns or rules from past data Facts (truth)
Inductive Rules A set y of domain values is select from all values x in X Then y is a subset of x Let z = x - y If support(y)/support(z) > threshold confidence minimum Then Given z => y In Prolog buy juice() <= morning(), buy milk() is a valid K rule buy juice() <= morning(), buy bread() is a not valid K rule
Association Rules • Market-Basket Model, Support, and Confidence • Apriori Algorithm • Sampling Algorithm • Frequent-Pattern Tree Algorithm • Partition Algorithm (using Natural Intelligence) • Other Types of Association Rules (AI learning) • Rule based systems and Ontologies
Knowledge from experience:Pieter Brueghel Proverbs (1564-1638)
Knowledge Example: Source: Wikipedia
Knowledge Example: Source: Wikipedia
Knowledge Example: Source: Wikipedia
Knowledge Detail: Source: Wikipedia
Knowledge Detail: Source: Wikipedia
Applications of Data Mining • Proactive Modeling • Predicting a behavior • Identification • Finding relation in the model rather than in the reality • Classification • Segment markets classes • Optimization • Focus limited resources
Commercial Data Mining Tools • At its infancy • Innovation • Lots of testing and tuning • Expensive • Requires a good mix of natural and artificial intelligence An example http://www.callminer.com/
Learning and Classification “To understand is to perceive patterns” Sir Isaiah Berlin (1909-1997) Artificial Intelligence deals with knowledge and learning learning, this may be achieved by • Traversing knowledge bases • Rule based and Logical Programming • Artificial selection • Genetic and Evolutive Algorithms • Adaptive methods • Connectionism and Feedback • Probability Models • Bayesian and Causal Networks
BRETAMModel Performance metric B R E T A M time Products Breakthrough Research Throw away Innovative Line Low cost Source: A model for Forecatsting Technology(B Gaines 1996) maturity automation reproduction discovery theory empirical
Inflection Point in BRETAM B R E T A M R E T A R E T A M R E T A M B • While a breakthrough is being corroborated by researchers, a new breakthrough is found thus linking growth at the Inflection point(before d2fn/dt = 0) • In this fashion breakthrough and research are combined • BRETAM phases occur horizontally(a new breakthrough) and vertically(at the same time) • Innovation requires that new products create new routines in work habits B B B B A breakthrough reflects a new form not seen before
AI history using the BRETAM modeladapted from Gaines B R E T A M Collaboration Artificial Cognitive Sciences B R E T A M Cooperation B R E T A M Autonomy B R E T A M invention 2004 Acquisition 1972 research B R E T A M knowledge B R E T A M innovative Computing Sciences 1980 Interaction 1996 line B R E T A M 1972 Software low cost B R E T A M throw away 1988 Hardware Digital circuits B R E T A M 1940 1948 1956 1964 1972 1980 1988 1996 2004 2012 2020 2028 2036 2044 0 1 2 3 31/2 Generations 4 5 products
Some authors/dates/places/topics • Symbolic AI 1956-1976, • Herbert Simon CMU • Marvin Minsky, John McCarthy, MIT • Ed Feigenbaum, B. Buchanan, Stanford • Learning 1950-1960 and 1980-2000 • F. Rosenblatt, N. Wiener, W.R. Ashby • D. McCleland, Hopfield, Kohonen • P. Bock, A. Barto • Selection 1970-1975 and 1990-2005 • J. Holland Michigan U • D. Goldberg Alabama U
Knowledge needs Why: Data Mining What: Strategic Statistics Data Base What: Tactical Reports How: Daily Operations Processes Transaction oriented applications
Data Mining Source: Peter Bajcsy http://algdocs.ncsa.uiuc.edu/PR-20021116-1.ppt
Knowledge Discovery Source: Peter Bajcsy http://algdocs.ncsa.uiuc.edu/PR-20021116-1.ppt
Required effort • Arrows indicate the direction we hope the effort should go. Source: Peter Bajcsy http://algdocs.ncsa.uiuc.edu/PR-20021116-1.ppt
Purpose of Data Warehousing • Traditional databases are not optimized for data access only they have to balance the requirement of data access with the need to ensure integrity of data. • Most of the times the users need only read access but, need the access to be fast over a large volume of data. • Most of the data required for analysis comes from multiple databases and these analysis are recurrent and predictable to be able to design specific software to meet the requirements. • There is a great need for tools that provide decision makers with information to make decisions quickly and reliably based on historical data. The above functionality is achieved by Data Warehousing and Online Analytical Processing (OLAP).
Introduction, Definitions, and Terminology Traditional databases are transactional. Data warehouses have the distinguishing characteristic that they are mainly intended for decision support applications. Applications that data warehouse supports are: • OLAP (Online Analytical Processing) is a term used to describe the analysis of complex data from the data warehouse. • DSS (Decision Support Systems) also known as EIS (Executive Information Systems) supports organization’s leading decision makers for making complex and important decisions. • Data Mining is used for knowledge discovery, the process of searching data for unanticipated new knowledge.
Comparison with Traditional Databases • Traditional databases are transactional and are optimized for both access mechanisms and integrity assurance measures. Data Warehouses are mainly optimized for appropriate data access. • Data warehouses emphasize more on historical data as their main purpose is to support time-series and trend analysis. • Compared with transactional databases, data warehouses are nonvolatile. • In transactional databases transaction is the mechanism change to the database. By contrast information in data warehouse is relatively coarse grained and refreshed according to a careful choice of refresh policy, usually incremental.
Classification of Data Warehouses Generally Data Warehouses are an order of magnitude larger than the source databases. The sheer volume of data is an issue, based on which Data Warehouses could be classified as follows. • Enterprise-wide data warehouses – They are huge projects requiring massive investment of time and resources. • Virtual data warehouses – They provide views of operational databases that are materialized for efficient access. • Data marts – These are generally targeted to a subset of organization, such as a department, and are more tightly focused.
Data Modeling for Data Warehouses Example of Two- Dimensional vs. Multi- Dimensional
Building A Data Warehouse • The data must be extracted from multiple, heterogeneous sources. • Data must be formatted for consistency within the warehouse. • The data must be cleaned to ensure validity. • Difficult to automate cleaning process. • Back flushing, upgrading the data with cleaned data. Acquisition of data for the warehouse when there is not a single DBMS
Building A Data Warehouse • Usage projections • The fit of the data model • Characteristics of available resources • Design of the metadata component • Modular component design • Design for manageability and change • Considerations of distributed and parallel architecture • Distributed vs. federated warehouses.
Functionality of a Data Warehouse Functionality that can be expected: • Roll-up: Data is summarized with increasing generalization • Drill-Down: Increasing levels of detail are revealed • Pivot: Cross tabulation is performed • Slice and dice: Performing projection operations on the dimensions. • Sorting: Data is sorted by ordinal value. • Selection: Data is available by value or range. • Derived attributes: Attributes are computed by operations on stored derived values.
Warehouse vs. Data Views • Data Warehouses exist as persistent storage instead of being materialized on demand. • Data Warehouses are not usually relational, but rather multi-dimensional. • Data Warehouses can be indexed for optimization. • Data Warehouses provide specific support of functionality. • Data Warehouses deals huge volumes of data that is contained generally in more than one database. Views and data warehouses are alike in that they both have read-only extracts from the databases. However, data warehouses are different from views in the following ways:
Advanced Topics Object Oriented Databases • The components of an Object • Inheritance • Encapsulation • Methods • Persistence • Classes, Constructors and Interfaces • Multimedia
History of OO Models and Systems • Languages: Simula (1960’s), Smalltalk (1970’s), C++ (late 1980’s), Java (1990’s) • Experimental Systems: IRIS at H-P labs, Postgres - Montage - Illustra at UC/B • Commercial OO Database products: Ontos, Gemstone, O2 ( -> Ardent), Objectivity, Objectstore ( -> Excelon), Versant, Poet, Jasmine (Fujitsu – GM)
Overview of Object-Oriented Concepts • MAIN CLAIM: OO databases try to maintain a direct correspondence between real-world and database objects so that objects do not lose their integrity and identity and can easily be identified and operated upon • Object: Two components: state (value) and behavior (operations). Similar to program variable in programming language, except that it will typically have a complex data structure as well as specific operations defined by the programmer
Overview of Object-Oriented Concepts • In OO databases, objects may have an object structure of arbitrary complexity in order to contain all of the necessary information that describes the object. • In contrast, in traditional database systems, information about a complex object is often scattered over many relations or records, leading to loss of direct correspondence between a real-world object and its database representation.
Overview of Object-Oriented Concepts • To encourage encapsulation, an operation is defined in two parts: • signature or interface of the operation, specifies the operation name and arguments (or parameters). • method or body, specifies the implementation of the operation.
Object Identity, Object Structure, and Type Constructors • Type Constructors: In OO databases, the state (current value) of a complex object may be constructed from other objects (or other values) by using certain type constructors. • The three most basic constructors are atom, tuple, and set. Other commonly used constructors include list, bag, and array. The atom constructor is used to represent all basic atomic values, such as integers, real numbers, character strings, Booleans, and any other basic data types that the system supports directly.
Example 1, one possible relational database state corresponding to COMPANY schema