250 likes | 383 Views
Relational Databases. Find Databases here…. And here…. The “Deep Web”. Dynamic pages, generated from databases Not easily discovered using crawling Perhaps 400-500 times larger than surface Web Fastest growing source of new information. Name. Type. URL. Web Size (GBs).
E N D
The “Deep Web” • Dynamic pages, generated from databases • Not easily discovered using crawling • Perhaps 400-500 times larger than surface Web • Fastest growing source of new information
Name Type URL Web Size (GBs) National Climatic Data Center (NOAA) Public http://www.ncdc.noaa.gov/ol/satellite/satelliteresources.html 366,000 NASA EOSDIS Public http://harp.gsfc.nasa.gov/~imswww/pub/imswelcome/plain.html 219,600 National Oceanographic (combined with Geophysical) Data Center (NOAA) Public/Fee http://www.nodc.noaa.gov/, http://www.ngdc.noaa.gov/ 32,940 Alexa Public (partial) http://www.alexa.com/ 15,860 Right-to-Know Network (RTK Net) Public http://www.rtk.net/ 14,640 MP3.com Public http://www.mp3.com/ Deep Web • 60 Deep Sites Exceed Surface Web by 40 Times
Database Basics • What is a database? • Collection of data, organized to support access • Models some aspects of reality • Components of a relational database: • Field = an “atomic” unit of data • Record = a collection of related fields • Table = a collection of related records • Each record is one row in the table • Each field is one column in the table • Primary Key = the field that uniquely identifies a record • Database = a collection of tables
Why “Relational”? • Databases model some aspects of reality • A relational database views the world in terms of entities and relations between them
The Registrar Example • What do we need to know (i.e., model)? • Something about the students (e.g., first name, last name, email, department) • Something about the courses (e.g., course ID, description, enrolled students, grades) • Which students are in which courses
A First Try Put everything in a big table… Discussion: Why is this a bad idea?
Good Database Design • Save space • Save each fact only once • More rapid updates • Every fact only needs to be updated once • More rapid search • Finding something once is good enough • Avoid inconsistency • Changing data once changes it everywhere
Another Try... Student Table Department Table Course Table Enrollment Table
Approaches to Normalization • For simple problems: • Start with “binary relationships”: pairs of fields that are related • Group together wherever possible • Add keys where necessary • For more complicated problems: • Entity relationship modeling (LBSC 670)
Some Lingo • “Primary Key” uniquely identifies a record • e.g., student ID in the student table • “Foreign Key” is primary key in the other table • It need not be unique in this table
The Data Model Student Table Department Table Course Table Enrollment Table
Project SELECT Student ID, Department
Restrict WHERE Department ID = “HIST”
Join Student Table Department Table “Joined” Table
Relational Operations • Choosing columns: SELECT • Based on their label • Choosing rows: WHERE • Based on their contents • Joining tables: JOIN • These can be specified together SELECT Student ID, Dept WHERE Dept = “History” department ID = “HIST”
Some SQL • SQL = Structured Query Language • Used in many types of database systems
Select query • SELECT LastName, FirstName from StudentTable StudentTable
Select with Restriction • SELECT LastName, FirstName from StudentTable where DeptID = ‘HIST’ • Will return • Peters, Kathy • Smith, Chris
Select with Restriction • SELECT StudentID from EnrollmentTable where Grade > 81 Enrollment Table
Select with Join SELECT LastName, FirstName from StudentTable JOIN EnrollmentTable on StudentTable.StudentID =EnrollmentTable.StudentID where EnrollmentTable.Grade > 95 Results: Smith, John Enrollment Table Student Table
Discussion Point • How is a relational database different from a spreadsheet?