160 likes | 283 Views
UFCEUS-20-2 : Web Programming. Lecture 7 Database Theory & Practice (1) : Data Modelling. What is Data?.
E N D
UFCEUS-20-2 : Web Programming Lecture 7Database Theory & Practice (1) : Data Modelling
What is Data? • A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automated means.sam.dgs.ca.gov/TOC/4800/4819.2.htm • Factual information, especially information organized for analysis or used to reason or make decisions.www.florite.com/support/terminology.htm • The raw material of information. Refers mostly to the information entered into, and stored within a computer or file.www.angelfire.com/bc/nursinginformatics/glossary.html • Information stored on the computer system, used by applications to accomplish tasks.www.krollontrack.com/legalresources/glossary.asp
Data, Information & Knowledge noise – unstructured, unrelated, non-symbolic, unrecognised interference; e.g. the ‘string’: ?£^&**8…---┐€↨/ data – symbolic or non-symbolic unstructured ‘facts’ about one or more domains; e.g. the ‘strings’: James 111081 information – data + meaning (+ context); e.g. my sons name and dob (in an application form) or my friends name and id (within a IRC service) knowledge – refers to awareness of a domain or procedures used to attain goals; e.g. knowing when and where the above two uses are appropriate. wisdom– intuitive and heuristic understanding of the limits of knowledge and how and when to apply knowledge; when to reject information or question the validity of data; may be counter-intuitive and abductive; The WKIDN Hierarchy
What is a Database? • Any organized collection of information; it may be paper or electronic.www.library.arizona.edu/rio/glossary.htm • a standardized collection of information in computerized format, searchable by various parameters; in libraries often refers to electronic catalogs and indexes.library.wexler.hunter.cuny.edu/lyannott/thesis_guide/libraryterms.html • A database is a collection of information stored in a computer in a systematic way, such that a computer program can consult it to answer questions. The software used to manage and query a database is known as a database management system (DBMS). The properties of database systems are studied in information science. en.wikipedia.org/wiki/Database
What is data modelling? • Data modelling is concerned with the design of the data content and structure of the database. • Data modelling gives us a formal model of an organisation which is achieved through the consolidation of the user requirements specification. • The information gathered from fact-finding (analysis) is appraised and the basic data and data relationships are established. • The result of the data analysis is a representation of the user's view of the data. It is independentof any DBMS software or hardware considerations. • The model documents the structure of and interrelationships between the data. It is presented as a combination of simple diagrams and written definitions.
Why is the data model important? • Leverage - A small change to a data model may have a major impact on the system as a whole. Although in commercial information systems, the programs are far more complex and take much longer to specify and construct than the database, their content and structure are heavily influenced by the database design. Their structure will therefore need to reflect the way the data is organized ... in other words, the data model. - A well designed data model can make programming simpler and cheaper. Even a small change to the model may lead to significant savings in total programming cost. • Conciseness - The data model is formal and concise and the time required to review a data model is considerably less than that needed to review functional specifications which could amount to hundreds of pages. • Data Quality - Data is a valuable organizational asset. The data model plays a key role in ensuring good data quality by establishing a common understanding of what data is to be held and how to interpret it.
What makes a good data model? (1) • Completeness - Does the model support all the necessary data? Do we need to record something that is currently omitted? • Non-redundancy - Does the model specify a database in which the same fact could be recorded more than once?Recording the same data more than once (duplication) increases the amount of space needed to store the database, requires extra processes (and processing) to keep the various copies in step, and leads to consistency problems if the copies get out of step. • Enforcement of Business Rules - How accurately does the model reflect and enforce the rules that apply to the business' data? If rules correctly reflect the business requirement and are correctly enforced, the resulting database will be a powerful tool in enforcing correct practice, and in maintaining data quality. • Data Reusability - Will the data stored in the database be reusable for purposes beyond those anticipated in the process model? This requirement is often expressed in terms of its solution: as far as possible, data should be organisedindependently of any specific application.
What makes a good data model? (2) • Stability and Flexibility - How well will the model cope with possible changes to the business requirements? Can any new data required to support such changes be accommodated in existing tables? Alternatively, will simple extensions suffice? Or will major structural changes be required, with corresponding impact on the rest of the system? - A data model is stable in the face of a change to requirements if it requires little or no modification. Models are more or less stable, depending on the level of change required. - A data model is flexible if it can be readily extended to accommodate likely new requirements with only minimal impact on the existing structure. • Elegance - Does the data model provide a reasonably neat and simple classification of the data? Elegant models are typically simple, consistent, and easily described and summarized. - The difference in development cost between systems based on simple, elegant data models and those based on highly complex ones can be considerable There is a risk that a simple model ends up being complex and brittle as result of incremental business changes over a long period without any rethinking of processes and supporting data.
What makes a good data model? (3) • Communication - How effective is the model in supporting communication among the various stakeholders in the design of a system? Do the tables and columns represent business concepts that the users and business specialists are familiar with and can easily verify? Will programmers interpret the model correctly? • Integration - How will the proposed database fit with the organization's existing and future databases? Even when individual databases are well designed, it is common for the same data to appear in more than one database and for problems to arise in drawing together data from multiple databases. Are the coding schemes and definitions consistent? How easy is it to keep the different versions in step, or to assemble a complete picture?
What makes a good data model? (4) • Conflicting Objectives -The above aims will often conflict with one another. An elegant but radical solution may be difficult to communicate. An elegant model may exclude requirements that do not fit. A model that accurately enforces a large number of business rules will be unstable if some of those rules change. A model that is easy to understand because it reflects the perspectives of the immediate system users may not support reusability or integrate well with other databases. - The goal is to develop a model that provides the best balance among these possibly conflicting objectives. As in other design disciplines, achieving this is a process of proposal and evaluation, rather than a stepby-step progression to the ideal solution. We may not realize that a better solution or trade-off is possible until we see it.
(some) consequences of bad model/design database can’t scale slow data access data gets lost and cannot be recovered specific data is hard to find transactions corrupt data queries cannot be optimized data is contradictory fields hold meaningless data transactions go wrong database design is complex and confusing records are hard to update security is flawed leading to damage or theft users have to download much more data than needed
Database Design Stages & Deliverables (1) - Theconceptual data modelis a (relatively) technology independent specification of the data to be held in the database. It is the focus of communication between the data modeler and business stakeholders, and it is usually presented as a diagram with supporting documentation. - The logical data modelis a translation of the conceptual model into structures that can be implemented using a database management system (DBMS) – usually relational. - The physical data model incorporates any performance considerations and is presented in terms of tables, columns, indexes etc. It will include a specification of physical storage and access mechanisms. • Conceptual, Logical and Physical Data Models
Database Design Stages & Deliverables (2) • The internal schemadescribes how the data will be physically stored and accessed, using the facilities provided by a particular DBMS. it represents the foundations, electrical wiring, and hidden plumbing of the database. • The conceptual schema describes the organization of the data into tables and columns. • - The external schemas specify views that enable different users of the data to see it in different ways. It is usual to provide one external schema that covers the entire conceptual and then to provide a number of external schemas that meet specific user requirements. • The Three-Schema Architecture and Terminology
Where do data models fit in? • Data-Driven Approaches for example Information Engineering (IE) appeared in the late 1970’s and have since evolved into parallel or “blended” approaches. The emphasis was on developing the data model before the detailed process model. • Parallel (Blended) Approaches provides simultaneous modelling of the data and the process models. Supported by CASE products. • Object-Oriented Approaches uses conventional (relational) data models as OO databases are not commonly used. Use of UML. • Prototyping Approaches Rapid Application Development (RAD) have in many cases replaced the traditional waterfall approaches to systems development. Use of data-driven approach to data-modelling. • Agile Methods Backlash against “heavy” methodologies – values software over documentation; shared understanding; pair-programming etc. Data model is developed early in the development process.
Who should be involved in data modelling? • The system users, owners, and/or sponsorswill need to verify that the model meets their requirements. • Business specialists (sometimes called subject matter experts or SMEs) may be called upon to verify the accuracy and stability of business rules incorporated in the model. • The data modeler has overall responsibility for developing the model and ensuring that other stakeholders are fully aware of its implications for them. • Process modelers and program designers will need to specify programs to run against the database. They will want to verify that the data model supports all the required processes. • The physical database designer (often an additional role given to the database administrator)will need to assess whether the physical data model needs to differ substantially from the logical data model to achieve adequate performance, and, if so, propose and negotiate such changes. • The systems integration manager (or other person with that responsibility, possibly an enterprise architect, data administrator, information systems planner, or chief information officer) will be interested in how the new database will fit into the bigger picture: are there overlaps with other databases; does the coding of data follow organizational or external standards.
Bibliography / Readings / Home based activities Bibliography • Data Modeling Essentials (3rd ed.), GC Simpson & GC Witt, Morgan Kaufmann, 2005 • Beginning Database Design Solutions, R Stephens, Wrox, 2009 • Data Modeling- A Beginner's Guide, A Oppel, McGraw-Hill Osborne, 2010