360 likes | 371 Views
Learn about surrogate keys, normalization, functional dependencies, and relational theory in SQL Server with examples and comparisons of key strategies. Enhance your database design knowledge with practical insights.
E N D
Tony Rogerson, SQL Server MVPTech Articles: http://sqlblogcasts.com/blogs/tonyrogerson Commentary: http://twitter.com/tonyrogerson SQL User Group: http://sqlserverfaq.com Hire Me: http://www.sql-server.co.uk Sets, Surrogates, Normalisation, Referential Integrity - the Theory with example Scaling considerations in SQL Server Non-Functional Dependencies
Who Am I? • July 86 started in IT on IBM mainframe – nearly 24 years – was 40 last Sat (10th!) • Went freelance in 98 • PL/1, CICS, System W, VB, VB.NET, C++, Application System • DB2, SQL Server since 4.21 • Studying on the MSC Business Intelligence course at Dundee Uni (see Mark W ;))
What I wanted to cover • Surrogates • Repeating Groups • Relation Valued Attributes • Multi Valued Groups • Normalisation • 1NF • 2NF • 3NF / BCNF • 4NF • Relations (Tables) • Set Theory • Functional Dependencies • Join Dependencies • Candidate Keys • Row by Row processing • Sub Queries • Indexing • Index Intersection • Indexing the FACT schema • Indexing the Relational schema • NULLs • Referential Integrity (Declarative and Procedural) • Foreign Keys • INSTEAD OF triggers • Inserting into Views • Relationships • Decompose • Domains
This 50 mins… • Thinking in Sets • Surrogate Keys • What they are • Comparison NEWID, NEWSEQUENTIALID, IDENTITY • Fragmenation • Normalisation • An introduction – what is it? Why use it? • Joins – Pre-filter problems, index intersection • Fragmentation again • Referential Integrity • Optimiser -> Query rewrite • Locking considerations around Foreign Keys and Declarative RI (using Triggers)
Thinking in Sets What I think you need to know
Functional Dependencies for Performance Hardware (Less IO’s cheaper kit) Thinking in Sets(SQL) Indexing Strategy Correct DatabaseDesign (Normalisation and Denormalisation) Thinking in Sets(Relational Model)
Relational Theory V Objects Real World has Objects which have Attributes, Attributes themselves can be Objects which may themselves have Attributes (which may be Objects etc…) Pub has SaleItems Drinks has Type (Lager, Wine, Spirit), Price Food has Type (Snacks, Starter, Main), Price has Identity Location … Address … Town etc… Name Operator (Tenant, Owner, Licensee etc…)
Modelled in .NET Public Class Pub Dim PubID as Integer Dim PubName as StringDim PubSales() As SaleItem Public Class SaleItem Dim priv_ItemSold As New Object ‘ Drink or Food Public Class Drink Public Class Food
What’s a Relation (Table)? Header Tuple Tuple Projection Body • To be a table it must have at least one candidate key (a key being an attribute that is unique) • Attribute we know as a Column • Tuple we know as a Row • Projection we know as the SELECT clause (the columns you’ve chosen)
Surrogate Keys What they are Comparison NEWID, NEWSEQUENTIALID, IDENTITY Fragmentation
Surrogate Keys • System Generated value used as the primary key. • NOT a replacement for natural candidate keys. • Value of not exposed outside the system boundary – so the user doesn’t see it because of validation. • Aids Concurrency with Key updates. • Removes overhead with composite key joins.
Surrogate Key Scope (Verification!) Surrogate Usage • Surrogate key scope: Application Plumbing • Natural key scope: External World Web Server SQLServer Web Server
Surrogate Keys - Choices • Random • NEWID • Application generated GUID • Sequential • IDENTITY, MAX(x)+1 • NEWSEQUENTIALID • Random causes poor performance because of disk latency – 8KB reads; yes – index nodes probably in buffer pool but leaf probably isn’t
Demo • INSERT comparison • NEWID • NEWSEQUENTIALID • IDENTITY
Normalisation An Introduction – what is it? Why use it? Joins – Pre-filter problems, index intersection Fragmenation
Normalisation (Dependency Theory) • Removes redundancy – reduces size of stored data • Joins, Joins and more dam joins! • Referential Integrity – more tables, more FK’s that helps the Optimiser – slows INSERTS/UPDATES, causes fragmentation • The Join (lookup) problem – filter then Join (try and get it to Merge Join)
1NF • To be a table it needs a key to make the tuples (rows) unique • All attributes (column) names need to be unique • The row/column intersection value needs to be Atomic • Get rid of Repeating Groups
What’s a Repeating Group? The same Attribute (column) has the same Domain (set of values) and occurs multiple times in the table but the instance of a domain can appear in any of the occurrences.
Example of Repeating Group The attendee Ester in session NonFuncDepend can be recorded in Attendee1 to 4 – it don’t matter which one Collapse the repeated attributes (the group) into its own table.
Repeating Group?? / RVA?? The Attributes modelled are Attendees and Job Roles; domain values exist (attendees) and (job roles) each Session has {JobRole, Attendee} Collapse into its own table.
NOT a Repeating Group Address 1 – 3 are separate domains with a completely different set of values Collapse the repeated attributes (the group) into its own table???
2NF – Sort the Keys out • Remove non-key columns not related to the entire primary key • Check Functional Dependencies to find columns to decompose
2NF - Example • Session# has functional dependency on SessionRoom
3NF/BCNF – Sort the Body out • Columns must be dependant on the whole key and nothing but the key so help me Codd. • Decompose FD’s in the table Body
3NF - Example • Attendee# -> {Registered, From, URL, Email} • From -> {URL}
Denormalisation Deliberately left NULL See Mark Whitehorn and Yasmin Ahmad session “Denormalisation – having your cake and eating it” This room after lunch
Demo • Joins • Pre-Filter problem • Index intersection
Referential Integrity Introduction Optimiser : Query Rewrite Locking Considerations around Foreign Keys and Declarative RI (using Triggers).
Referential Integrity Is.. • Declarative RI: Done with CONSTRAINTS eg. FK, CHECK • Procedural RI:Done with Triggers (coded in a procedure) • DRI gives information to the optimiser like permitted values (CHECK CONSTRAINT), if a row exists in another table (FK CONSTRAINT)
Demo • RI • Optimiser rewrite • Foreign Keys and Locking • Triggers and Locking – using for RI and READ_COMMITTED_SNAPSHOT
DRI (FK’s) - Locking • Be wary of blocking from FK’s look ups • Unaffected by READ_COMMITTED_SNAPSHOT • Uses Serialisable ISOLATION • FK’s have a benefit - Optimiser can use them
PRI (Triggers) - Locking • As per the rest of your queries – affected by the default isolation level for your database • READ_COMMITTED_SNAPSHOT causes a problem • Remember: Last good committed value returned rather than reader being blocked by the writer. • DRI is somewhat limited when faced with today’s more complex business data relationship rules
Summary Where are we with this then?
Further Reading • Mark Whitehorn and Yasmin Ahmad session after lunch • C J Date Edinburgh seminar on 13th/14thMay • http://www.justsql.co.uk/chris_date/cjd_edin_may_2010.htm • Live Meetings starting May – see http://sqlserverfaq.com