450 likes | 466 Views
Well, Sort-of. Data, Data Everywhere *. The Sloan Digital Sky Survey started in 2000. In its first few weeks it collected more data than had been amassed the entire history of astronomy. By 2010, it had collected 140 terabytes of data.
E N D
Data, Data Everywhere * • The Sloan Digital Sky Survey started in 2000. In its first few weeks it collected more data than had been amassed the entire history of astronomy • By 2010, it had collected 140 terabytes of data • Its replacement, scheduled for 2016, will collect that amount of data every 5 days • In 2010, Walmart processed 1M customer transactions every hour • This equates to 2.5 petabytes, the equivalent of 167 times the books in the American Library of Congress • Facebook houses more than 40 billion photos * Excerpted from a Feb. 27th, 2010, Economist article
Data, Data Everywhere * • Decoding the human genome involves 3 billion base pairs. • The first time it was attempted, it took 10 years • It can now be accomplished in 1 week. • It is estimated that within the next few years, the amount of global data created will approach 2,000 Exabytes per year (1 Exabyte = 1,000 Petabytes) • Problem: It is estimated that the total amount of storage available will be approximately 100 Exabytes * Excerpted from a Feb. 27th, 2010, Economist article
Data, Data Everywhere * • Kilobyte = 210 bytes 1,024 bytes • One page of typed text typically requires 2K • Megabyte = 220 bytes 1,048,576 bytes • Storing the complete works of Shakespeare requires 5MB • Gigabyte = 230 bytes 1,073,741,824 bytes • A 2-hour film requires 1-2 GB • Tera(trillion)byte = 240 bytes 1,099,511,627,776 bytes • All of the books in the Library of Congress requires 15 TB • Peta(quadrillion)byte = 250 bytes 1,125,899,906,842,624 bytes • Google processes about 1 PB every hour • Exa(quintillion)byte = 260 bytes 1,152,921,504,606,846,976 bytes • Equivalent to 10 billion copies of the economist • Zetta(sextillion)byte = 270 bytes 1,180,591,620,717,411,303,424 bytes • The total amt. of information in existence is estimated at 1.2 ZB • Yotta(septillion)byte = 280 bytes 1,208,925,819,614,629,174,706,176 bytes * Excerpted from a Feb. 27th, 2010, Economist article
What is Data Resource Management?? • A managerial activity that applies information systems technologies to the task of managing an organization’s data resources to meet the information needs of their business stakeholders What does that mean?? • It’s a very fancy way of saying that we are going to talk about databases
A way we can model (parts of) the real world (well, Sort-of) What is a Database?? • A large, integrated collection of Data and Metadata • Entities (i.e., a person, place, object or event we wish to have information about). • Students • Physicians • Patients • Customers • TheAttributes of that entity (i.e., characteristics). • GPA • Specialty • Illness • Balance Due • TheRelationships between entities (i.e., how do entities interact). • One Physician has many Patients • A Patient has only one Physician
What is it, really?? Consider some information the University maintains: Name Major Tuition Paid Address Courses Taken Tuition Owed SSN Grades Received Grants/Scholarships HOW is this information stored? You are an entity with attributes which vary. Within the University, different areas have different interests in you (i.e., the Registrar, the Bursar, etc.). Nonetheless, you are still part of the University as a whole.
How does this relate to a database? You are an entity class(student) Table with attributes Fields which vary Your attributes can be different Within the University, differentareas, have different interests in you Files () (i.e,. The Registrar, Bursar, etc.) Nonetheless, you are still part of the University Database
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Hernandez, Juan 123456789 72 2.42 HOW does this relate to a database? Hierarchically: ADatabaseconsists of Files, whichcontain Records, whichcontain Jones, Mary 234567890 102 3.87 Fields, whichmay consist of a variety of data types Notice that there should always be a Key (Unique) Field
Alternatively (from smallest to largest component): • Character:A single alphabetic, numeric or other symbol • Field:A group of related characters • Entity:A person, place, object or event • Attribute:A characteristic of an entity • Record:A collection of attributes that describe an entity • File:A group of related records • Database:An integrated collection of logically related data elements
Why Databases?? Databases were not always commonplace Initially, there were no databases or DataBase Management Systems (DBMS) Individual Applications were written to meet specific user needs (File Processing or Traditional File Processing Systems) As business applications became more complex, it became apparent that there were too many problems associated with Traditional Processing Systems
What Problems?? Single Applications • A program was written for (generally) oneand onlyone application (The user would specify their individual needs) Program-Data Dependence • Since each program was written for a specific data set, a change in the data, or data format, required a change in the program which uses the data
What Problems?? Data Redundancy • duplicate data requires an update to be made to all files storing that data Lack of Data Integration • data stored in separate files require special programs for output making ad hoc reporting difficult Data Input Errors • If more people are required to enter data, the likelihood that errors/mis-entered data will be stored is increased
How did databases come about?? 1960’s: North American Rockwell’s Moon Project • > 60% of all data used was duplicated in multiple data sets (redundancy) By the Mid 1960’s: • Rockwell/IBM Joint Venture to develop a DataBase Management System (DBMS) • Hierarchical in Nature Later: • IBM’s Information Management System (IMS)
How are databases different?? Database Management Approach • Consolidates data records into one database that can be accessed by many different application programs. • Software interface between users and databases • Data definition is stored once, separately from application programs
How are databases different?? Database Management Approach
What is a DBMS?? Software that controls the creation, maintenance, and use of databases
What are the major functions of a DBMS ??? Database Development: • Defining and organizing the content, relationships and structure of the data needed to build the database • Specifying integrity constraints • Fixing of Access Rights (Authorization)
Places Contain Parts What are the major functions of a DBMS ??? Database Development: Entity Relationship Diagrams Consider the following situation A customer places an order. The order consists of parts. Entity Relationship Relationship Orders Customer An Organization about which we wish to maintain information An Association between Entities Entity
What are the major functions of a DBMS ??? Database Maintenance: • Updating a database continually to reflect new business transactions and other events • Updating a database to correct data and ensure accuracy of the data
What are the major functions of a DBMS ??? Database Interrogation: • Capability of a DBMS to report information from the database in response to end users’ requests • Query Language: allows easy, immediate access to ad hoc data requests • Report Generator: allows quick, easy specification of a report format for information users have requested
What are the major functions of a DBMS ??? Database Interrogation: • Natural Language vs. SQL Queries
What are the major functions of a DBMS ??? Application Development: • End users, systems analysts, and other application developers can use the internal 4GL programming language and built-in software development tools provided by many DBMS packages to develop custom application programs.
What are the forms of a DBMS ??? Hierarchical: relationships between records form a hierarchy or treelike structure Network: data can be accessed by one of several paths because any data element or record can be related to any number of other data elements Relational: All data elements within the database are viewed as being stored in the form of simple tables
StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting 456789012 Elam, Mary 123-22 E St. INFOSYS •••••• •••••• •••••• •••••• What are the forms of a DBMS ??? RDBMS Table Student Field Names Record Field
Faculty Student •••••• Owed Department Depart 987654321 103456678 1,502.36 •••••• Finance Marketing StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 123456789 876543210 COBA219 •••••• Finance INFOSYS 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting •••••• •••••• •••••• •••••• •••••• •••••• 456789012 765432109 •••••• COBA232 Accounting Accounting 456789012 Elam, Mary 123-22 E St. Accounting •••••• •••••• •••••• •••••• What are the forms of a DBMS ??? Table Student RDBMS Table Balance Table Department
What are the forms of a DBMS ??? Multidimensional Database Structure • Variation of the relational model that uses multi-dimensional structures to organize data and express the relationships between data
What are the forms of a DBMS ??? Object-Oriented Database Structure • Can accommodate more complex data types including graphics, pictures, voice and text
What are the forms of a DBMS ??? Object-Oriented Database Structure Encapsulation: • data values and operations that can be performed on them are stored as a unit • Conceals the exact details of how a particular class works from objects that use its code or send messages to it Inheritance: • automatically creating new objects by replicating some or all of the characteristics of one or more existing objects
How do the DBMS structures compare ??? (These arte your authors’ viewpoints) Hierarchical: best for structured, routine types of transaction processing. Network: best when many-to-many relationships are needed Relational: best when ad hoc reporting is required.
How are databases developed ??? Database Development: Enterprise-wide database development is usually controlled by database administrators (DBA) Data Planning: • Database administrators and designers work with corporate and end user management to develop an enterprise model that defines the basic business process of the enterprise.
How are databases developed ??? Logical Schema: • data elements and relationships among them Physical Schema: • describes how data are to be stored and accessed on the storage devices of a computer system • Data Dictionary: catalog or directory containing metadata
How are databases developed ??? Logical vs. Physical Designs:
How are databases used??? Types of Databases:
How are databases used??? Types of Databases: • Operational:store detailed data needed to support the business processes and operations of a company Subject Area DataBases (SADB), Transaction Databases, Production Databases Customer databases Inventory databases Human Resources databases
How are databases used??? Types of Databases: • Distributed:databases that are replicated and distributed in whole or in part to network servers at a variety of sites A single logical database that is spread across computers at multiple locations Replicated databases Partitioned databases Challenges: ensuring that data is constantly, consistently and concurrently updated
How are databases used??? Types of Databases: • External:contain a wealth of information available from commercial online services and from many sources on the World Wide Web Commercial/Shareware/Freeware Internet dominated
How are databases used??? Types of Databases: • Hypermedia:consist of hyperlinked pages of multimedia
How are databases used??? Types of Databases: Data Warehouses • Large database that stores data that have been extracted from the various operational, external, and other databases of an organization
How are databases used??? Types of Databases: Data Marts • Databases that hold subsets of data from a data warehouse that focus on specific aspects of a company, such as a department or a business process
How are databases used??? Types of Databases: Data Mining Uses: • Perform “market-basket analysis” to identify new product bundles. • Find root causes to quality or manufacturing problems. • Prevent customer attrition and acquire new customers • Cross-sell to existing customers • Profile customers with more accuracy