1 / 48

Structured Documents

Structured Documents. Week 3 LBSC 690 Information Technology. Outline. Questions Finishing networks Building the Web Building a better Web. TCP/IP layer architecture. Application. Application. Virtual network service. Transport. Transport. Virtual link for end to end packets.

Download Presentation

Structured Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured Documents Week 3 LBSC 690 Information Technology

  2. Outline • Questions • Finishing networks • Building the Web • Building a better Web

  3. TCP/IP layer architecture Application Application Virtual network service Transport Transport Virtual link for end to end packets Network Network Network Network Virtual link for packets Link Link Link Link Link Link Link for bits Link for bits Link for bits

  4. The TCP/IP “Protocol Stack” • Link layer moves bits • Ethernet, cable modem, DSL • Network layer moves packets • IP • Transport layer provides services to applications • UDP, TCP • Application layer uses those services • DNS, SFTP, SSH, …

  5. User Datagram Protocol (UDP) • The Internet’s basic transport service • Sends every packet immediately • Passes received packets to the application • No delivery guarantee • Collisions can result in packet loss • Example: sending clicks on web browser

  6. Transmission Control Protocol (TCP) • Built on the network-layer version of UDP • Guarantees delivery all data • Retransmits missing data • Guarantees data will be delivered in order • “Buffers” subsequent packets if necessary • No guarantee of delivery time • Long delays may occur without warning

  7. File Transfer Program (FTP) • Used to move files between machines • Upload (put) moves from client to server • Download (get) moves files from server to client • Available using command line and GUI interfaces • Normally requires an account on the server • Userid “anonymous” provides public access • Web browsers incorporate anonymous FTP • Automatically converts end-of-line conventions • Unless you select “binary”

  8. Hands On: FTP • Start a cmd window • Type “ftp ftp.umiacs.umd.edu” • Login in anonymously with • User: anonymous • Password: your email address • Go download a file • Type “cd pub/gina/lbsc690/” • Type “binary” • Type “get hwOne.ppt” • Exit • Type “quit” • Try it again with a graphical FTP program • WS_FTP, for example

  9. Encryption • Secret-key systems (e.g., DES) • Use the same key to encrypt and decrypt • Public-key systems (e.g., PGP) • Public key: open, for encryption • Private key: secret, for decryption • Digital signatures • Encrypt with private key, decrypt with public key

  10. Encrypted Standards • Secure Shell (SSH) • Replaces Telnet • Secure FTP (SFTP)/Secure Copy (SCP) • Replaces FTP • Secure HTTP (HTTPS) • Used for financial and other private data • Wired Equivalent Protocol (WEP) • Used on wireless networks

  11. Network Abuse • Flooding • Excessive activity, intended to prevent valid activity • Worms • Like a virus, but self-propagating • Sniffing • Monitoring network traffic (e.g., for passwords)

  12. Encryption Issues • Key length • 128 bits balances speed and protection today • Trust infrastructure • How do you prevent “bait and switch”? • Who certifies a digital signature is valid?

  13. The World-Wide Web My Browser Local copy of Page requested Page Requested Proxy Server Fetch Page Send Request Remote Sever Internet

  14. Web Standards • HTML • How to write and interpret the information • URL • Where to find it • HTTP • How to get it

  15. HyperText Transfer Protocol (HTTP) • Send request GET /path/file.html HTTP/1.0 From: someuser@jmarshall.com User-Agent: HTTPTool/1.0 • Server response HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html><body> <h1>Happy New Millennium!</h1> … </body> </html>

  16. Uniform Resource Locator (URL) • Uniquely identify web pages on the WWW • Domain name • Directory path • File name Domain name File name URL:http://www.clis.umd.edu/courses/schedules/fall2003.html Directory path

  17. HyperText Markup Language (HTML) • Simple document structure language for Web • Advantages • Adapts easily to different display capabilities • Widely available display software (browsers) • Disadvantages • Does not directly control layout

  18. Hands On:Learning HTML From Examples • Use Internet Explorer to find a page you like • http://www.glue.umd.edu/~oard • On the “View” menu select “Source” • Opens a notepad window with the source • Compare HTML source with the Web page • Observe how each effect is achieved

  19. Hands On: “Adopt” a Web Page • Modify the HTML source using notepad • For example, change the page to yours • Save the HTML source on your “M:” drive • In the “File” menu, select “Save As” • Select “All Files” and name it “test.html” • FTP it to your ~/pub directory on WAM • sftp wam.umd.edu • cd ../pub/ • put test.html • View it • http://www.wam.umd.edu/~(yourlogin)/test.html

  20. HTML Document Structure • “Tags” mark structure • <html>a document</html> • <ol>an ordered list</ol> • <i>something in italics</i> • Tag name in angle brackets <> • Not case sensitive • Open/Close pairs • Close tag may be optional (if unambiguous)

  21. Logical Structure Tags • Head • Title • Body • Headers: <h1> <h2> <h3> <h4> <h5> • Lists: <ol>, <ul> (can be nested) • Paragraphs:<p> • Definitions: <dt><dd> • Tables: <table> <tr> <td> </td> </tr> </table> • Role: <cite>, <address>, <strong>, …

  22. Rendering • Different devices have different capabilities • Desktop • PDA • Rendering maps logical tags to physical layout • Controls line wrap, size, font… • Place the title in the page border • Render <h1> as 24pt Times • Render <strong> as bold • Somewhat browser-dependent • Internet Explorer and Netscape make different choices

  23. Physical Structure Tags • Font • Typeface: <font face=“Arial”></font> • Size: <font size=“+1”></font> • Color: <font color=“990000”></font> • http://webmonkey.wired.com/webmonkey/reference/color_codes/Emphasis • Bold: <b></b> • Italics: <i></i>

  24. Hypertext “Anchors” • Links make the Web a web! • Internal anchors: somewhere on the same page • <a href=“#students”> Students</a> • Links to: <a name=“students”>Student Information</a> • External anchors: to another page • <a href=“http://www.clis.umd.edu”>CLIS</a> • <a href=“http://www.clis.umd.edu#students”>CLIS students</a>

  25. Images • <img src=“URL”> or <img src=“path/file”> • <img src=“http://www.clis.umd.edu/IMAGES/head.gif”> • SRC: can be url or path/file • ALT: a text string • ALIGN: position of the image • WIDTH and HEIGHT: size of the image • Can use as anchor: • <a href=URL><img src=URL2></a> • Example: • http://www.umiacs.umd.edu/~daqingd/Image-Alignment.html

  26. Tables <table align=“center”> <caption align=“right”>The caption</caption> < tr align=“LEFT”> <th> Header1 </th> <th> Header2</th> </tr> <tr><td>first row, first item </td> <td>first row, second item</td></tr> < tr><td>second row, first item</td> <td>second row, second item</td></tr> </table> Example: http://www.umiacs.umd.edu/~daqingd/Simple-Table.html

  27. Frames • Divide browser pages into separate sections • Useful when you want to scroll separately • Each section can display an HTML page • Example 1: menu frame on the left side of a page <frameset cols=“10%,90%" > <frame src=“template.html"> <frame src=“images.html"> </frameset> • Example 2: • http://www.hq.nasa.gov/alsj/frame.html

  28. Designing Web Pages • Key design issues: • Content: What do you want to publish? • Style: How do you want to present it? • Syntax: How can you achieve that presentation? • Sources of information • Online tutorials (Yahoo points to lots of these) • Technical materials (e.g., the HTML 4.0 spec)

  29. Some Style Guidelines • Design for generic browsers • And test on every version you wish to support • Provide appropriate “access points” • User needs and navigation strategies differ • Design useful navigational aids • A Web search may lead to the middle of a site • Include some indication of currency • Date of last update, “new” icons, etc. • Indicate who is responsible for the content • Helps readers assess authority

  30. Accessibility Guidelines • Design for device independence • Maintain backward compatibility • Provide alternative pages if necessary • Provide alternatives for aural and visual content • Alt tags for images, transcripts for audio • Make is easy for assistive devices to work • Combine structural markup and style sheets • Give a title to each frame • Use HTML tables only for tabular data • Use markup to indicate language switching

  31. HTML Editors • Goal is to create Web pages, not learn HTML! • Several are available • Macromedia Dreamweaver available commercially • In Netscape, “File” – “Edit Page” for Composer • Tend to use physical layout tags extensively • Detailed control can make hand-editing difficult • You may still need to edit the HTML file • Some editors use browser-specific features • Some HTML features may be missing entirely • File names may be butchered by FTP

  32. HTML Validators • Syntax checking: cross-browser compatibility • http://validator.w3.org • Style checking: improved accessibility • http://bobby.watchfire.com

  33. What’s Wrong with the Web? • HTML • Confounds structure and appearance (XML) • HTTP • Can’t recognize related transactions (Cookies) • URL • Links breaks when you move a file (PURL)

  34. What’s a Document? • Content • Structure • Appearance • Behavior

  35. History of Structured Documents • Early standards were “typesetting languages” • NROFF, TeX, LaTeX, SGML • HTML was developed for the Web • Too specialized for other uses • Specialized standards met other needs • Change tracking in Word, annotating manuscripts, … • XML seeks to unify these threads • One standard format for printing, viewing, processing

  36. Goals of XML • Metalanguage • A toolkit for design markup languages • Unambiguous markup • Clear span of tags • Separate markup from presentation • Style info => stylesheet, so easy to change • Be simple

  37. A Family of Standards • Definition: DTD • Names known types of entities with “labels” • Defines part-whole and is-a relationships • Markup: XML • “Tags” regions of text with labels • Markup: XLink • Defines “hypertext” (and other) link relationships • Presentation: XSL • Specifies how each type of entity should be “rendered”

  38. XML Example • View “The Song of the Wandering Aengus” • http://www.umiacs.umd.edu/~oard/teaching/690/fall05/notes/3/xml.htm • Built from three files • yeats01.xml • poem01.dtd • poem01.xsl

  39. An XML Example <?xml version="1.0"?> <!DOCTYPE POEM SYSTEM "poem01.dtd"> <?xml-stylesheet type="text/xsl" href="poem01.xsl"?> <POEM> <TITLE>The Song of Wandering Aengus</TITLE> <AUTHOR> <FIRSTNAME>W.B.</FIRSTNAME> <LASTNAME>Yeats</LASTNAME> </AUTHOR> <STANZA> <LINE>I went on to the hazel wood,</LINE> <LINEIN>Because a fire was in my head,</LINEIN> <LINE>And cut and peeled a hazel wand,</LINE> </STANZA> </POEM>

  40. Document Type Definition (DTD) <!ELEMENT poem ( (title, author, stanza)* )> <!ELEMENT title (#PCDATA) > <!ELEMENT author (firstname, lastname) > <!ELEMENT firstname (#PCDATA) > <!ELEMENT lastname (#PCDATA) > <!ELEMENT stanza (line+ | linein+) > <!ELEMENT line (#PCDATA) > <!ELEMENT linein (#PCDATA) > #PCDATA span of text a,ba followed by b a|b either a or b a* 0 or more a’s a+ 1 or more a’s

  41. Specifying Appearance: XSL <xsl:template match="POEM"> <HTML> <BODY BGCOLOR="#FFFFCC"> <xsl:apply-templates/> </BODY> </HTML> </xsl:template> <xsl:template match="TITLE"> <H1> <FONT COLOR="Green"> <xsl:value-of/> </FONT> </H1> </xsl:template>

  42. An XLink Example …… <poem xmlns:xlink="http://www.w3.org/1999/xlink"> <author xlink:href="yeatsRDFS3.xml“ xlink:type="simple">W. B. Yeats</author> <poems> <poem1 xlink:href="http://www.kirjasto.sci.fi/wbyeats.htm" xlink:type="simple">The Rose</poem1> <poem2 xlink:href="http://www.kirjasto.sci.fi/wbyeats.htm" xlink:type="simple">The Tower</poem2> </poems> </poem> ……….

  43. Some XML Applications • Text Encoding Initiative • For adding annotation to historical manuscripts • http://www.tei-c.org/ • Encoded Archival Description • To enhance automated processing of finding aids • http://www.loc.gov/ead/ • Metadata Encoding and Transmission Standard • Bundles descriptive and administrative metadata • http://www.loc.gov/standards/mets/

  44. What’s Wrong with the Web? • HTML • Confounds structure and appearance (XML) • HTTP • Can’t recognize related transactions (Cookies) • URL • Links breaks when you move a file (PURL)

  45. Cookies • Servers know users by IP address and port • Because that’s where they send the Web pages • Cookies preserve “state” • Server sends data to the browser • Browser later responds with the same data • A unique code (server-side state) • Information about the user (client-side state)

  46. Persistent URLs www.purl.org My Browser PURL PURL Sever URL URL Resource Sever Page

  47. Summary • Learning to build simple Web pages is easy • Which is good news for the homework! • All documents are structured documents • XML is a flexible markup language toolkits • The key is to understand its capabilities • XML editors can hide much of the complexity

  48. Before You Go! • On a sheet of paper (no names), answer the following question: What was the muddiest point in today’s class?

More Related