300 likes | 476 Views
Introduction to Databases and Database Development. Chapters 1, 9*, 10 *, and 11*. Topics. Chapter 1 The Database Environment Database Development Process Chapter 9 (Pages 409 – 410) Big Data Chapter 10 (Pages 444 – 445; 446 - 447) Master Data Management Data Federation
E N D
Introduction to Databases and Database Development Chapters 1, 9*, 10*, and 11*
Topics • Chapter 1 • The Database Environment • Database Development Process • Chapter 9 (Pages 409 – 410) • Big Data • Chapter 10 (Pages 444 – 445; 446 - 447) • Master Data Management • Data Federation • Chapter 11 (Pages 464 – 472, 486, 499 – 506) • Database Personnel • Metadata Management (e.g., Data Dictionaries) • Backup Facilities • Overview of Tuning the Database for Performance
Introduction to Databases Chapter 1
1960’s 1970’s 1980’s 1990’s 2000+ Federated MDDB XML NoSQL ……. Hierarchical Object Traditional Files Relational Network Object-Relational Evolution of Database Technologies
Duplicate Data Figure 1-3 Old file processing systems: Example
Traditional File Processing Environment • Dates back to before we had databases • Still in use today, including • Backup of database systems • Import/export data between systems • Some statistical analyses • Disadvantages: • Program-data dependence = “structural” & “data” • Limited data sharing = “islands of automation” • Duplication of data = “redundancy” • Lengthy development times • Excessive program maintenance
Database Environment, cont… • Hardware • Software • OS • Applications • User interface • Application programs • CASE tools and utilities • Database Management System • People • End Users • System Developers • Data & Database Administrators • Database • User data • Metadata (Repository) • Procedures • Backup/Recovery • Retention • Ownership/Location…
Advantages of Databases • Program-data independence • Improved data sharing • Minimal data redundancy • Improved data accessibility/responsiveness • Improved data consistency • Faster application development • Enforcement of standards • Improved data quality • Reduced program maintenance
SDLC SDLC for this class DB Activities in SDLC Planning Enterprise Modeling* DB Scope, Requirements (Conceptual Data Model) Analysis DB Design (Logical DB Design) Design DB Design (Physical DB Design) DB Implementation (Load, Test, Eval, Op) Implementation DB Maintenance*
Enterprise Data Modeling • Determine organizational data requirements • Build enterprise data model • outcome is a very high-level Entity-Relationship Diagram • see : • http://da.ks.gov/kito/ITPlans/data_maps06.ppt • http://www.tdan.com/view-articles/5205
Conceptual Data Modeling • Determine user data requirements • Determine business rules • Build conceptual data model • outcome is an Entity-Relationship Diagram (conceptual schema)
Logical Database Design • Select database model • e.g., the Relational Model • Transform conceptual (ERD) into logical (relational) data model • Normalize data structures • Outcome is normalized, relational tables
Physical Database Design • Select database product (e.g., SQL Server) • Select storage device(s) • Design fields, records, files (physical schema) • outcomes are detailed, physical definitions for: • fields (data dictionary) • records (space requirements for physical structures)* • files (access methods) *Will not do in this class
Data Models • What is a model? • American National Standards Institute / Standards Planning and Requirements Committee (ANSI/SPARC) classifications: • Conceptual • External • Internal (Logical) • Physical* • Database design / development involves creating these data models
Figure 1-12 Three-schema architecture Different people have different views of the database…these are the external schema The internal schema is the underlying design and implementation
Database Implementation • Create database file/table structures • Create views (external schema) • Establish access rights • Load test data • Write/test programs that process data • Install database (with production data) into production operations • outcomes are secured database tables loaded with data
Database Maintenance • Maintain database structures • Storage/space management • Performance, tuning • I/O Contention • CPU Usage • Application Tuning • Data availability • DBMS upgrades, "fixes" • Backup, recovery …….
Database Maintenance, cont… • Backup • Full • Incremental • Differential • Business Continuity • Data Replication ("fallback")
Traditional Administration Definitions • Data Administration: A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards • Database Administration: A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery
Data People Involved in SDLC • Data Administrators • Data Architects • Data Stewards • Data(base) Analysts/Designers • Business (Intelligence) Analyst • Data Mining Engineer; Big Data Engineer; Data Scientist; Business Analytics Engineer; … • (System/Traditional) DBAs • Application DBAs • Procedural DBAs • e-DBAs • Data Warehouse Administrators
Metadata Management • System Catalog • Part of DBMS • Data Dictionary • Typically passive • Extension of catalog metadata • Information Repository • Master Data Management (see ch. 10 pgs 444 – 445)
Master Data Management • "Ensuring the currency, meaning, and quality of reference data within and across various subject areas" (pg 444) • Identify • Common Data Subjects • Common Data Elements • Sources of "the truth" • Cleanse • Update applications to reference Master Data repository • Ensures consistency of key data (not ALL data) throughout organization
Cloud Computing • Business Model • Computing resources on demand • Need-based architectures • Internet-based delivery • Pay as you go • History (VERY high-level and approximate) Cloud Computing Time-sharing Utility Computing Personal Computers Virtual Machines WWW Grid Computing 50's 60's 70's 80's 90's 2000's
Cloud Computing Services • Impacts to Data(base) Administration • See textbook page 469
"Big Data" Skills • ETL (extract, translate, load) • Data warehousing (MDDB) • Data mining techniques • Statistical modeling with tools such as R, SAS, or SPSS • Data visualization tools • Technologies for structured and unstructured data • Hadoop (an Apache project to provide an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage.) • NoSQL • "NewSQL" See Big Data University for (mostly) free self-study training Source: http://blog.simplyhired.com/2012/05/carving-out-a-career-in-big-data.html#ixzz243BiPXjF
Summary • Evolution of Data Management • Description & disadvantages of file processing • Description of data management technologies • Database Concepts • Components of a DBMS Environment • Database Advantages • Database Development: • Overall SDLC • Database Activities in the SDLC • Data Models/Schemas • Types • What they represent • People Involved in SDLC (esp. DB) • Major job divisions and responsibilities • Newer job titles
Next Time… • Conceptual Data Modeling • Chapter 2: Modeling Data in the Organization • Chapter 3: Enhanced E-R Model • Assignment 1: DB Development ***DUE 1/23***