1 / 20

LIS508 lecture 2

LIS508 lecture 2. Thomas Krichel 2003-10-07. today's lecture. Recap on what we did last week. Encoding mark-up Databases. Recap. Computers deal with on/off signals called bits. Collections of these bits are binary numbers.

xenia
Download Presentation

LIS508 lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIS508 lecture 2 Thomas Krichel 2003-10-07

  2. today's lecture • Recap on what we did last week. • Encoding mark-up • Databases

  3. Recap • Computers deal with on/off signals called bits. • Collections of these bits are binary numbers. • Texts are (basically) strings of characters. To represent text, we need to represent characters. • To make a characters understandable to a computer we associate a number with each character. The result is a character set.

  4. Beyond characters • There is more to text than a string of characters. • There is layout • titles • abstracts • mathematical formula spacing

  5. Layout • Layout can be conveyed by additional text that has special meaning. Examples • LaTeX • HTML • PostScript • Another way is to do non-textual layout by adding some other digital signals. Examples • DVI • MS Word • MS Powerpoint These can not be shown in these slides!

  6. Example: LaTeX \bigskip\textbf{Class structure} Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15. \begin{tabular}{@{}llll@{}} 0&2003--09--23&introduction to the course &\\ 1&2002--09--30&bits bytes and characters &\\ 2&2003--10--07&databases and markup languages&\\

  7. Example: HTML <p><strong>Class structure</strong><p>Classes will be held in the computer lab in the Palmer School between 18:15 and 20:45. An optional practice session will last until 21:15.<p>Class details: <p><center><table width=100% border=1> <tr><td align=left> 0 </td><td align=left> 2003&#8211;09&#8211;23 </td><td align=left><a href="lis508w03a-00.ppt">introduction to the course</a> </td></tr><tr><td align=left> 1 </td><td align=left> 2002&#8211;09&#8211;30 </td><td align=left><a href="lis508w03a-01.ppt">bits bytes and characters</a> </td>

  8. Example: PostScript Fc(Class)g(structur)o(e)-104 3956 y Fd(Classes)26b(will)g(be)e(held)g(in)h(the)f(computer)f(lab)i(in)f(the)h(P)o(almer)f(School)g(between)f(18:15)h(and)g(20:45.)36 b(An)25 b(optional)e(practice)h(session)-104 4055 y(will)d(last)g(until)f(21:15.)-104 4155 y(Class)i(details:)-104 4307 y(0)141 b(2003\22609\22623)94b(introduction)18 b(to)i(the)h(course)-104 4407 y(1)141 b(2002\22609\22630)94 b(bits)21 b(bytes)f(and)g(characters)-104 4507 y(2)141 b(2003\22610\22607)94 b(databases)20 b(and)g(markup)e(languages)-

  9. DVI (rendition, "class structure") 1659: fntnum27 current font is ptmb8t 1660: setchar67 h:=-820459+473168=-347291, hh:=-22 1661: setchar108 h:=-347291+182183=-165108, hh:=-10 1662: setchar97 h:=-165108+327680=162572, hh:=11 1663: setchar115 h:=162572+254928=417500, hh:=27 1664: setchar115 h:=417500+254928=672428, hh:=43 1665: right3 163840 h:=672428+163840=836268, hh:=53 1669: setchar115 h:=836268+254928=1091196, hh:=69 1670: setchar116 h:=1091196+218232=1309428, hh:=83 1671: setchar114 h:=1309428+290976=1600404, hh:=101 1672: setchar117 h:=1600404+364376=1964780, hh:=124 1673: setchar99 h:=1964780+290976=2255756, hh:=142 1674: setchar116 h:=2255756+218232=2473988, hh:=156 1675: setchar117 h:=2473988+364376=2838364, hh:=179 1676: setchar114 h:=2838364+290976=3129340, hh:=197 1677: right2 -11792 h:=3129340-11792=3117548, hh:=196 1680: setchar101 h:=3117548+290976=3408524, hh:=214

  10. Databases • Databases are collection of data with some organization to them. • The classic example is the relational database. • But not all database need to be relational databases.

  11. Relational databases • A relational database is a set of tables. There may be relations between the tables. • Each table has a number of record. Each record has a number of fields. • When the database is being set up, we fix • the size of each field • relationships between tables

  12. Example: Movie database ID | title | director | date M1 | Gone with the wind | F. Ford Coppola | 1963 M2 | Room with a view | Coppola, F Ford | 1985 M3 | High Noon | Woody Allan | 1974 M4 | Star Wars | Steve Spielberg | 1993 M5 | Alien | Allen, Woody | 1987 M6 | Blowing in the Wind | Spielberg, Steven | 1962 • Single table • No relations between tables, of course

  13. Problem with this database • All data wrong, but this is just for illustration. • Name covered inconsistently. There is no way to find films by Woody Allan without having to go through all spelling variations. • Mistakes are difficult to correct. We have to wade through all records, a masochist’s pleasure.

  14. Better movie database ID | title | director | year M1 | Gone with the wind | D1 | 1963 M2 | Room with a view | D1 | 1985 M3 | High Noon | D2 | 1974 M4 | Star Wars | D3 | 1993 M5 | Alien | D2 | 1987 M6 | Blowing in the Wind | D3 | 1962 ID | director name | birth year D1 | Ford Coppola, Francis | 1942 D2 | Allan, Woody | 1957 D3 | Spielberg, Steven | 1942

  15. Relational database • We have a one to many relationship between directors and film • Each film has one director • Each director has produced many films • Here it becomes possible for the computer • To know which films have been directed by Woody Allen • To find which films have been directed by a director born in 1942

  16. Many-to-many relationships • Each film has one director, but many actors star in it. Relationship between actors and films is a many to many relationship. • Here are a few actors ID | sex | actor name | birth year A1 | f | Brigitte Bardot | 1972 A2 | m | George Clooney | 1927 A3 | f | Marilyn Monroe | 1934

  17. Actor/Movie table actor id | movie id A1 | M4 A2 | M3 A3 | M2 A1 | M5 A1 | M3 A2 | M6 A3 | M4 … as many lines as required

  18. SQL • Once we have the relational database, we can ask sophisticated questions: • Which director has had the most female actors working for him? • In which years films have been shot that starred actors born between 1926 and 1935? • Such questions can be encoded in a language know as “structured query language” or SQL. All relational database vendors implement a dialect of SQL.

  19. databases in libraries • Relational databases dominate the world of structured data • But not so popular in libraries • Slow on very large databases (such as catalogs) • Library data has nasty ad-hoc relationships, e.g. • Translation of the first edition of a book • CD supplement that comes with the print version Difficult to deal with in a system where all relations and field have to be set up at the start, can not be changed easily later.

  20. http://openlib.org/home/krichel Thank you for your attention!

More Related