280 likes | 461 Views
Thomas Krichel 2010-02-03. LIS512 lecture 2 relational databases. relational databases. A relational database is a set of tables. There may be relations between the tables. Each table has a number of records. Each record has a number of fields . Let us look at this bottom up.
E N D
Thomas Krichel 2010-02-03 LIS512 lecture 2relational databases
relational databases • A relational database is a set of tables. There may be relations between the tables. • Each table has a number of records. Each record has a number of fields. • Let us look at this bottom up.
the entity • We saw last time that an entity is something we want to gather information about. • An entity may be a prime focus of interest, like group 1 entities in FRBR. • Or an entity may be only interesting in an entity because they have a relationship to entities that are the prime focus, like FRBR group 2 and 3 entities.
entity set and entity • In fact the talk in FRBR is a bit loose. • When FRBR talks about entities, it really means entity sets. • FRBR say that work is an entity. It really means that are workS, and each work is an something of interest. • “work” is an conceptual entity set, and each work is an entity.
attributes • An entity has attributes. • These are things we want to know about. • Say a person has a name, birthday, height and weight. • We can use that data to find out of the person is overweight.
records • A record is a collection of data elements and their values that pertain to one entity that is being described. • Name Thomas Krichel • Birthday 1965-06-05 • Data element names can also be called “attribute names” or “field name”. These terms are pretty much synonymous.
record syntax • The way to write out a record (it’s syntax) may vary. • Attribute: value syntax • Name: Thomas Krichel • Birthday: 1965-06-05 • XML syntax • <name>Thomas Krichel</name> • <birthday>1965-06-05</birthday>
tables • In a database, records are represented as lines in a table. The first line of the table lists the field names. Here is a sample table with two records |name | birthday | | Thomas Krichel| 1965-06-05| | Karl Marx | 1818-05-13|
identity • Consider the following table |name | birthday | | Thomas Krichel | 1965-06-05| | Thomas Krichel | 1965-06-05| • Is the record a duplicate? • Without further information, we can not tell.
key • To be able to find what records describe the same entities, databases use a construct called the key. • The key for each record must have a different value. • The name of the key field is not important, but it is important that the field is a key.
examples • Here is a correct example |key|name| birthday | |2 | Thomas Krichel | 1965-06-05| | 1 | Thomas Krichel| 1965-06-05| • Here is an incorrect example |key|name| birthday | |1 | Thomas Krichel | 1965-06-05| | 1 | Thomas Krichel | 1965-06-05|
identity • Identity is simple when it comes to person. • But when it comes to other entities we are interested in the FBRB world. • Identifiers and the handling of them is one of the most difficult issues of digital librarianship.
identifier schemes • There are many identifier schemes for group 1 entities • ISBN – ISSN • DOI – URL • Some of them may cover different manifestations / expression. • i.e. is the ISBN for a print book the same as for the ebook?
relations • Not all database system are relational. • The most widely used ones are relational. • That means you can build relationships between tables and enforce them. • I am coming to this from an old example.
example: Movie database ID | title | director | date M1 | Gone with the wind | F. Ford Coppola | 1963 M2 | Room with a view | Coppola, F Ford | 1985 M3 | High Noon | Woody Allan | 1974 M4 | Star Wars | Steve Spielberg | 1993 M5 | Alien | Allen, Woody | 1987 M6 | Blowing in the Wind | Spielberg, Steven | 1962 • Single table • No relations between tables, of course
problem with this database • All data wrong, but this is just for illustration. • Name covered inconsistently. There is no way to find films by Woody Allan without having to go through all spelling variations. • Mistakes are difficult to correct. We have to wade through all records, a masochist’s pleasure.
Better movie database ID | title |director|year M1 | Gone with the wind | D1 | 1963 M2 | Room with a view | D1 | 1985 M3 | High Noon | D2 | 1974 M4 | Star Wars | D3 | 1993 M5 | Alien | D2 | 1987 M6 | Blowing in the Wind | D3 | 1962 ID |director name | birth year D1 | Ford Coppola, Francis | 1942 D2 | Allan, Woody | 1957 D3 | Spielberg, Steven | 1942
Relational database • We have a one to many relationship between directors and film • Each film has one director • Each director has produced many films • Here it becomes possible for the computer • To know which films have been directed by Woody Allen • To find which films have been directed by a director born in 1942
enforcing relationships • Relational database software has ways to enforce relationships. • So when you change the record of M5 to say it was directed by director D9, it will complain that no such director has been defined.
Many-to-many relationships • Each film has one director, but many actors star in it. Relationship between actors and films is a many to many relationship. • Here are a few actors ID | sex | actor name |birth year A1 | f | Brigitte Bardot | 1972 A2 | m | George Clooney | 1927 A3 | f | Marilyn Monroe | 1934
Actor/Movie table actor id | movie id A1 | M4 A2 | M3 A3 | M2 A1 | M5 A1 | M3 A2 | M6 A3 | M4 … as many lines as required
relational databases • Relational databases are powerful within organizations that have a relatively centralized command and control structure, i.e. within a company and / or a government department. • The relational database model has problems when we are working in a coordinated fashion bet • Imagine the web working an a relational database model!
FRBR • FRBR really brings the way relational databases work to the library world. • But here is the problem: how to implement this. • Imagine a totally FRBRized, relational database library look like. • It would be fine if there is one Big Library Agency (BLA). • What would BLA do?
BLA job • At a starting BLA would first have to register all the works, and say what a work is. • That would include maintaining catalogs of composite works. • This job is problematic because of • the vague nature of the work • the scale of the job (who will pay)
currently • Libraries purchase items. • They can look up manifestation data from providers such as OCLC. • Finding out what expression this represents is more difficult. • Linking it to a work is almost impossible. One would need BLA for that.
a web decentralized digital library • Something that computer scientists dream of http://www.youtube.com/watch?v=OM6XIICm_qo • He completely avoids the idea of the meaning of data. • In a centralized organization, the meaning can be dictated. • In a decentralized world, the meaning as to be agreed upon.
outlook • In the next few weeks, we are looking at emerging ways to try to find some agreement • character set • record format • Then we are looking at library related standards doing similar things.
Thank you for your attention! Please switch off machines b4 leaving! http://openlib.org/home/krichel