240 likes | 353 Views
Database . Chapter 1 – Pascal’s Practical Issues... The issue of Data Types. Fabian Pascal – Database Activist. General argument – that the Relational Model of Data is by far the best and most robust method for organising data.
E N D
Database Chapter 1 – Pascal’s Practical Issues... The issue of Data Types
Fabian Pascal – Database Activist • General argument – that the Relational Model of Data is by far the best and most robust method for organising data. • Relational model is the application of logic and math to database management.
Intro • Computer industry is driven by fads (just like the fashion industry) • Vendors profit from accelerated obsolescence • Users pay, vendors (helped by the trade media) profit • Profit comes largely from ignorance and luring users to the next fad
Larry Ellison (?) • CEO of Oracle • “Client Servers were a tremendous mistake and we are sorry that we sold it to you” (1999 ) • Only thing that should run on the desk top is a browser and word processing.
Trouble at mill • Database mgmt failures are largely due to failure of DBMS users to be properly educated – • The approach is often ad-hoc, cookbook approach... This is based on business culture • DBAs are often self-taught • Not exposed to database concepts, principles and methods
DB fundamentals • Are not product or vendor specific • Fundamentals are often deemed “theoretical” or “academic” • Industry focused on training (in a particular product)
The main issues • Unstructured data and complex data types • Business rules and integrity enforcement • Keys • Duplicate data • Normalisation and “denormalisation” • Entity subtypes and supertypes • Redundancy
What is a DBMS • Basically a deductive logic system • Derives new facts from a set of asserted facts • The derived facts are true if and only if : • The initial assertions are correct • The derivation rules are logically sound and consistent. • Everyone wants the right answers – but many are unaware that their practices and their use of DBMS software fails to adhere to these points
Careful what you wish for: Data Types and Complexity • Issue of Type (data type) is often misunderstood • A cause and a consequence (historically) by RDBMS to implement domains • So, the blame is placed on the “relational model” for lack of support for so-called complex data types
Fundamentals • A row in a database can be expressed in natural language – for eg (proposition) • Employee number is 100, the name is Jones, works in department HR, was hired 22/08/2009 and earns a salary of $45,000 • The general form... (predicate) • Employee number is EMP#, the name is ENAME, works in DEPT#, was hired on HIREDATE and earns salary SAL
So... • When specific values are substituted for the place holders the predicate reduces to an individual proposition. • Data Types are in essence the rules that define among other things, sets of valid values for a database. • Data types serve as conceptual pools of the permissible values.
Every database value is of a type... Or “typed” • And it consists of • A name • One or more possible representations of which ; • One is physically stored • At least one is declared to the users • Possible additional type constraints • A set of operators permissible on the type’s values.
Type v representations • How a types values are represented internally in physical storage is distinct to how we view it. • Also, a type can have multiple representations for eg. • Temperature can be Celsius or Fahrenheit (or simply warm/hot/cool/brass monkey • Actual representation should be hidden from users – Data Independence
Operators • At least one operator to select (obviously in SQL that is SELECT) • Many things are not meaningfully comparable even though those operators would allow for comparison • It wouldn’t make sense to compare TEMPERATURE with SALARY even if they were both represented as decimals
“Simple” Types • Atomic types • System-defined – but all DBMS’s come with their own set • INTEGER (INT) Declared representation INTEGER Type constraint >=-231 and <=231 Operators – set applicable to integers
Cont.. • Physical representations of system defined types are vendor specific (OS platform)
User-defined • Type EMP# • Declared representation CHAR(3) • Type constraint [A-Z][0-9][0-9] • A set of operators applicable to employee numbers (eg. select, =, but not < or >) • SALARY could be constrained by some business rules, not more than 150,000 for eg. Operators may include the general integer operators but probably not sqroot
User defined • Includes the ability to define and limit the set of operators that are associated with a given data type • Otherwise erroneous comparisons can occur. - Examples?
Erroneous comparisons? • The truth of facts, or, the correctness of data representations depends on whether or not the values are (a) consistent with the rules and (b) correspondent with the external world, that is not erroneous. • (a) is your responsibility as a DB designer. • (b) is almost invariably accidental.
Does it make sense • To compare shoe size with age (or IQ) ? • Sometimes perhaps... DBMS should support user-defined types but also user-defined functions Yes Oracle does!
Complex types • A type is a named set of values without any restrictions as to how those values are represented. • ANY representation is ok • Operators must be able to be defined for ANY such representation • What about a type FINGERPRINT?
How would be store the FINGERPRINT data type • How would we compare it? • How do we agree upon what kinds of operators are the right ones • We can all agree on INTEGER operators. • The implementation of operators for such types is not only complex but also not universally agreed upon.
Cont. • Is this a shortcoming of the relational model? • It’s not that relational can’t handle such data types... (it can) • Implementation of “complex” types is best left to those with expertise in the application domain where such types might be used (user defined) • And they can be defined for any desired representation / manipulation
“Unstructured Data” • Is there such a thing? • Isn’t all (meaningful) data structured ? • Info in text/images can be represented relationally – but at a cost (knowledge, time, effort) • Complex coding of types and associated operators • Perceptions are subjective ...