670 likes | 683 Views
Chapter 10. Data and Knowledge Management. Agenda. Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining. Data. Set of discrete, objective facts about events Business - structured records of transactions
E N D
Chapter 10 Data and Knowledge Management
Agenda • Information processing • Database • Data Administrator • The DBMS • Distributing data • Data warehousing and data mining
Data • Set of discrete, objective facts about events • Business - structured records of transactions • Little relevance or purpose
Information • Message with sender and receiver • Meant to change way receiver perceives something • Have an impact on his judgment / behavior
Data Processing • Contextualize - why was data gathered? • Categorize - what are its key components? • Calculate - analyze mathematically • Condense - summarize in more concise form
Information Processing • Compare - in kind and in time • Consequences - how used in decisions / actions • Connections - relation to other information • Conversation - what other people think about this information
Agenda • Information processing • Database • Data Administrator • The DBMS • Distributing data • Data warehousing and data mining
Database • Element • Types • Structure • Models • Creation • Topology
Element • Bit, byte, field, record, file, database • Entity, attribute, key field • Relation • Class, object
Database Types • Business database • Geographical information database • Knowledge database / deductive database • Multimedia database • Data warehouse • Data marts • Multimedia and hypermedia database • Object-oriented database
Database Structure • Data definition language • Schema & subschema • Data Manipulation language • Structured Query Language (SQL) • Query By Example (QBE) • Data dictionary
Database Models • Hierarchical • One to many • TPS or routine MIS • Network • Many to many • TPS or routine MIS • Relational • Normalization • Ad hoc reports or DSS • Object-oriented • E-commerce
Database Creation • Conceptual design • Logical view • Entity-relationship (ER) diagram • Normalization
Entity Relationship Diagram • Entity: object or concept • Relationship: meaning association between objects • Attribute: property of an object • Simple & Composite • Single-valued & multi-valued • Derived • Key • Primary key • Foreign key
Normalization • A technique for identifying a true primary key for a relation • Types • First normal form: not repeating group • Second normal form: every non-primary-key attribute is fully functionally dependent on the entire primary key • Third normal form: no transit dependency
Structured Query Language • Select • Join
SQL DML - SELECT • SELECT [DISTINCT|ALL] {* | [col-expr [AS newname]][,...] FROM table-name [alias] [,...] [WHERE condition] [GROUP by colm [, colm] [HAVING condition]] ORDER BY colm [, colm]
SQL DML - SELECT • SELECT attributes (or calculations: +, -, /, *) FROM relation • SELECT DISTINCT attributes FROM relation
Examples • SELECT stuname FROM student; • SELECT stuid, stuname, credit FROM student; • SELECT stuid, stuname, credit+10 FROM student; • SELECT DISTINCT major FROM student;
SQL DML - SELECT • SELECT attributes (or * wild card) FROM relation WHERE condition
Examples • SELECT * FROM student; • SELECT stuname, major, credit FROM student WHERE stuid = ‘S114’; • SELECT * FROM faculty WHERE dept = ‘MIS’;
SELECT - WHERE condition • AND OR • NOT IN • NOT IN BETWEEN • IS NULL IS NOT NULL • LIKE '%' multiple characters • LIKE ‘_’ single characters
Examples • SELECT * FROM faculty WHERE dept = ‘MIS’ AND rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ OR rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ NOT rank = ‘full professor’;
SELECT * FROM class WHERE room LIKE ‘B_S%’; • SELECT * FROM class WHERE room NOT LIKE ‘BUS%’; • SELECT productid, productname FROM inventory WHERE onhand BETWEEN 50 and 100;
SELECT companyid, companyname FROM company WHERE companyname BETWEEN ‘G’ AND ‘K’; • SELECT productid, productname FROM inventory WHERE onhand NOT BETWEEN 50 and 100; • SELECT companyid, companyname FROM company WHERE companyname NOT BETWEEN ‘G’ AND ‘K’;
SELECT facname FROM faculty WHERE dept IN (‘MIS’, ‘ACT’); • SELECT facname FROM faculty WHERE rank NOT IN (‘assistant’, ‘lecture’); • SELECT customername FROM customer WHERE emailadd IS NOT NULL;
SELECT customername FROM customer WHERE creditlimit IS NULL;
SELECT - aggregate functions • COUNT (*) • COUNT • SUM • AVG • MIN • MAX
Examples • SELECT COUNT(*) FROM student; • SELECT COUNT(major) FROM student; • SELECT COUNT(DISTINCT major) FROM student;
SELECT COUNT(stuid), SUM(credit), AVG(credit), MAX(credit), MIN(credit) FROM student;
SELECT - GROUP • GROUP BY • HAVING
Examples • SELECT major, AVG(credit) FROM student GROUP BY major HAVING COUNT(*) > 2; • SELECT course#, COUNT(stuid) FROM enrollment GROUP BY course# HAVING COUNT(*) > 2;
SELECT major, AVG(credit) FROM student WHERE major IN (‘MIS’, ‘ACT’) GROUP BY major HAVING COUNT(*) > 2;
SELECT - ORDER BY • ORDER BY • ORDER BY ... DESC
Examples • SELECT facname, rank FROM faculty ORDER BY facname; • SELECT facname, rank FROM faculty ORDER BY rank DESC, facname;
SELECT - JOIN Tables • Multiple tables in FROM clause • MUST have join conditions!!!
Examples • SELECT stuname, grade FROM student, enrollment WHERE student.stuid = enrollment.stuid;
SELECT enrollment.course#, stuname, major FROM class, enrollment, student WHERE class.course# = enrollment.course# AND enrollment.stuid = student.stuid AND facid = ‘F114’ ORDER BY enrollment.course#;
SUBQUERY, EXIST, NOT EXIST • SELECT s.stuname, major FROM student s WHERE EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid);
SELECT s.stuname, major FROM student s WHERE NOT EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid);
Database Creation • Physical design • Physical view • Data topology (organization) • Centralized • Distributed database • Replicated database • Partitioned • Organization & access method • Sequential file • Indexed sequential file • Direct or random file • Security • Logical, physical, and transmitting
Selection Criteria • User’ needs (type of application) • Compatibility • Portability • Reliability • Cost • Feature • Performance • Vendor’s support • Others?
Agenda • Information processing • Database • Data Administrator • The DBMS • Distributing data • Data warehousing and data mining
Data Administrator • Clean up data definitions • Control shared data • Manage distributed data • Maintain data quality
Clean Up Definitions • Synonyms / aliases • Standard data definitions • Names and formats • Data Dictionary • Active • Integrated
Control Shared Data • Local - used by one unit • Shared - used by two or more activities • Impact of proposed program changes on shared data • Program-to-data element matrix
Manage Distributed Data • Geographically dispersed • Whether shared data or not • Different levels of detail • Different management levels
Maintain Data Quality • Put owners in charge of data • Verify data accuracy and quality • Purge old data
Agenda • Information processing • Database • Data Administrator • The DBMS • Distributing data • Data warehousing and data mining
The DBMS Data Base Management System: software that permits a firm to: • Centralize data • Manage them efficiently • Provide accessto applications • Such as payroll, inventory