680 likes | 808 Views
The Internet. CT101 –Computing Systems. Contents. The Internet Architecture Addressing Protocols DNS E-Mail WWW Security. The Internet. The Internet: An internet that spans the world
The Internet CT101 –Computing Systems
Contents • The Internet • Architecture • Addressing • Protocols • DNS • E-Mail • WWW • Security
The Internet • The Internet: An internet that spans the world • Original goal was to develop a means of connecting networks that would not be disrupted by local disasters. • Today it has shifted from an academic research project to a commercial undertaking.
Internet Architecture • Internet Service Provider (ISP) • Tier-1 • Tier-2 • Tire 1 and tier 2ISPs are networks of routers that collectively provide the Internet’s communication infrastructure • Access ISP: Provides connectivity to the Internet • Traditional telephone (dial up connection) • Cable connections • DSL • Wireless
Internet Addressing • IP address: pattern of 32 or 128 bits often represented in dotted decimal notation • E.g. represents an Internet address • 10001100 11001011 00001000 00010110 • Mnemonic address (alternative addressing system more suitable for humans) • Domain names • Top-Level Domains
Internet Software Layers • Application: Constructs message with address • Transport: Chops message into packets • Network: Handles routing through the Internet • Link: Handles actual transmission of packets
Internet Protocols - TCP/IP Protocol Suite • Transport Layer • TCP • UDP • Network Layer • IP (IPv4 and IPv6)
DNS • IP addresses are difficult to remember • JohnDoe@ would be difficult to remember • If JohnDoe’s mail server moves to another machine, then his e-mail address would not be valid anymore • Something like JohnDoe@wuzwuz.ucg.ie would be appropriate • Need some mechanisms to translate wuzwuz.ucg.ie to the IP address • To solve this problem DNS was invented • It is a hierarchical, domain-based naming scheme and a distributed database system for implementing this naming scheme • Usage: • Map name onto an IP address, an application program calls an library procedure, called resolver, passing it the name as a parameter (i.e. gethostbyname() is an resolver) • The resolver sends UDP packet to a local DNS server which looks up the name and returns the IP address to the resolver • The resolver returns the IP address to the application, which can establish an TCP/IP connection with the destination (or send UDP packets)
DNS Name Space • Internet is divided into over 200 top level domains • Each domain is divided into sub-domains, which are further partitioned, etc.. • All domains can be represented by a tree • The leaves of the tree represent domains that have no sub-domains (but contain machines) • A leaf domain may contain a single host or represent a company and contain thousands of hosts • Top level domains could be generic and country domains
Domain Names • Can be either absolute (ends with a period i.e. eng.sun.com.) or relative (it doesn’t end with a dot) • Relative ones have to be interpreted in a context to find the true meaning • Both of them refers to a specific node in the tree and all the nodes under it • Are case insensitive (edu, Edu, EDU are same thing) • Components names can be up to 63 characters and full names should not exceed 255 characters • There is no rule against registering under two top level domains (sony.com and sony.nl) • Each domain controls how it allocates the domains under it • i.e. Japan makes a domains ac.jp and co.jp that mimic edu and com • Netherlands doesn’t make this distinction • To create a new domain, permission is required from the domain that will include it; once created, it can create sub-domains without having to ask permission from the higher up domains.
Name servers • One DNS server could service all requests • In practice it will be overloaded • To solve this, DNS name space is divided in non overlapping zones • Each zone contains some part of the tree and name servers holding zone info • A zone would have a primary DNS (gets info from disk) • One or more secondary DNS (get info from the primary DNS)
Name Servers – Lookup mechanism • In the example, a resolver on flits.cs.vu.nl is looking for IP address of linda.cs.yale.edu (using recursive query; some servers don’t implement recursive query and return the address of the next server to try) • The resolver sends a query containing the domain name sought • The query is forwarded by the local name server to the name server for domain edu, that is found in its database….etc • Once the records get back to cs.vu.nl name server, they will be entered in a local cache, in case they are needed later;
Internet Corporation for Assigned Names & Numbers (ICANN) • Allocates IP addresses to ISPs who then assign those addresses within their regions. • Oversees the registration of domains and domain names.
Traditional Internet Applications • Electronic Mail (email) • Domain mail server collects incoming mail and transmits outgoing mail • Mail server delivers collected incoming mail to clients via POP3 or IMAP • File Transfer Protocol (FTP) • Telnet and SSH • WWW (World Wide Web)
More Recent Applications • Voice Over IP (VoIP) • Internet Radio • N-unicast • Multicast
E-Mail • Architecture and services • User agent • Message formats • Message transfer agents • SMTP • Final delivery
E-Mail Architecture • E-mail system consists of two parts • User agents, which allow people to read and send email • Local programs that provide a command based or graphical method for interacting with e-mail system • Message transfer agents, which move the messages from source to destination • Are typically system daemons or processes that run in background, having the job to move messages
E-Mail functions • E-mail system functions • Composition – refers to the process of creating messages and answers; although any text editor can be used for the text of the message, the system itself can provide assistance with addressing and numerous header fields attached to each message • Transfer – refers to moving messages from the originator to the recipient; this requires establishing a connection to the destination or some intermediate machine, outputting the message and releasing the connection
E-mail functions • E-mail system functions • Reporting – has to do with telling the originator what happened to the message; Was it delivered? Was it rejected? Was it lost? • Displaying – showing the incoming message is important, so the people can read their e-mail; sometime conversion or a special viewer is required (i.e. if the message is a PS message or an audio file) • Disposition – what the recipient does after the message has been received; possibilities include throwing it away before reading it, throwing it away after reading it, saving it and so on.
E-mail architecture and functions • Distinction between envelope and its contents; • envelope encapsulates the message and contains info needed for transporting the message, such as destination address, priority and security level • Message has two parts: headers (interpreted by the user agent) and body (info for the human recipient)
E-mail user agent • Sending e-mail • User must provide the message and the destination address (user@dns-address) • User agents may support mailing lists • Receiving e-mail • When an user agent is started, it looks at user’s mailbox before displaying anything • Then it may announce the number of messages in the mailbox
E-mail message format • Basic ASCII e-mail message using RFC 822 • Messages consists of a primitive envelope (described in RFC821), some number of header fields, a blank line and then the message body • Each header field (logically) consists of a single line of ASCII text, a colon and, for most fields, a value • RFC822 was designed long ago and doesn’t clearly distinguish between the envelope fields and the header fields • This was revised in RFC 2822, however, wasn’t possible to completely redo it due to the widespread usage
Email message transfer • Message transfer mechanism is concerned with relaying messages from the originator to the destination • This can be done by establishing an transport level connection between the source and the destination and then just transfer the message • SMPT – Simple Mail Transfer Protocol • Source machine establishes a TCP connection on port 25 on destination machine, where SMPT daemon listens. This daemon accepts the incoming connections and copies messages from them into the appropriate mailboxes • If a message can’t be delivered, an error report containing the first part of the undeliverable message is returned to the sender • It is a simple ASCII protocol
SMTP Protocol • Connection establishment (on port 25) • Data exchange • the client machine (operating as a client) waits for the destination machine (operating as a server) to talk first; • the server begins by sending a line of text giving its identity and telling whether is prepared to receive mail; • if it is not, then the client releases the connection and tries again latter • If the server is willing to accept mail, then the client announces whom the e-mail is coming from and whom it is going to • If such recipient exists at the server end, then the client get the go-ahead to send the message • The client sends the message, server acknowledges it • Connection is released
SMTP typical problems • Some old implementations can’t handle more than 64KB message length • If the server and client have different timeouts, one of them may give up while the other is still busy, unexpectedly terminating the connection • In some situations infinite mail storms can be triggered • If host 1 is holding mailing list A and host 2 holds mailing list B and each list contains an entry for other one, then a message sent to either list could generate a never ending amount of e-mail traffic unless it is checked • RFC281 defines ESMTP (Extended SMTP) • Clients wanting to use it should start initially with EHLO instead HELO; if this is rejected then the server is regular SMTP server
Final delivery • Assuming that all machines can send and receive mail all the time, the e-mail model so far works • This model breaks for people accessing Internet over a dialup connection • What happens when Elinor wants to send Carolyn e-mail and Carolyn is not currently online? • One solution is to have a message transfer agent on ISP machine; since this transfer agent can be online all the time, e-mail can be sent 24 hours a day • This solution creates another problem: how does the user gets e-mail from ISP’s message transfer agent • Solution to create another protocol that allows user transfer agents (on client PCs) to contact the message transfer agent (on ISP’s machine) and allow e-mail to be copied from ISP to the user • One such protocol is POP3 (Post Office Protocol Version 3), RFC 1939
Final delivery • Sending and reading mail when the receiver has a permanent Internet connection and the user agent runs on the same machine as the message transfer agent. • Reading e-mail when the receiver has a dial-up connection to an ISP.
POP3 • Starts when the user starts the mail reader • Mail reader calls up the ISP (if there is no connection) and establishes a TCP connection with the message transfer agent on port 110; • Authorization • Having user logged in by sending its username and password • Transactions • User collecting the e-mails and marking them for deletion • Update • Causes the e-mails to be deleted
IMAP • POP3 works fine for users with one e-mail account with one ISP, accessed from one PC • If mail was accessed from different locations, user may loose e-mails, security issues may appear, etc • An alternative final delivery protocol, IMAP (Internet Message Access Protocol), defined in RFC2060 • Instead assuming that all messages will be downloaded and work offline after that (like POP3), IMAP assumes that all e-mail will remain on the server indefinite in multiple mailboxes • Provides extensive mechanisms to read messages or parts of messages, mechanisms to create, destroy and manipulate multiple mailboxes.
Web Mail • Various companies (i.e. Hotmail and Yahoo) provide e-mail service using Web mail. • Normal message transfer agents are listening on port 25 for incoming SMTP connections • Messages are delivered using special web pages; when the user goes to the e-mail Web page, a form is presented in which the user is asked for a login name and password.
WWW • Architectural Overview • Static Web Documents • Dynamic Web Documents • HTTP – The HyperText Transfer Protocol • Performance Enhancements • The Wireless Web
Architecture Overview Web is a collection of web pages Each page contains links to other pages Hypertext – idea of having one page point to another. It is text, displayed on a computer, with references (hyperlinks) to other text that the reader can immediately follow Browser – program to view pages Hyperlinks – strings of text that are links to other pages Example: • Typical web page • The page reached by clicking on Department of Animal Psychology.
Architectural Overview • Browser displays a page on the client machine • Click on a link, the browser sends a message to the abcd.com web server asking it for the page • When page arrives, it is displayed; if it contains a hyperlink on a page on xyz.com, that is clicked, then the browser will send a message to xyz.com server and the process continues
Client side • Pages are named using URL (Uniform Resource Locators) (i.e. http://www.abcd.com/products.html) • Name of protocol (http) • DNS name of the machine where the page is located (www.abcd.com) • The name of the file containing the page (products.html) • When the link is selected: • Browser detects the URL (by reading the input) • Browser asks DNS server for IP address of www.abcd.com • DNS replies with IP address: • Browser makes a TCP connection on port 80 to • It sends a request asking for file /products.html • Web server www.abcd.com sends file /products.html • TCP connection is released • Browser displays all the text in /products.html • Web pages are written in standard language called HTML • A page may consist of a formatted document in PDF format, an icon in GIF format, a video in MPEF format, a song in MP3 format, or any other format
Client side • The browser may have problems interpreting all of these formats … rather than making the browsers larger and larger, a more general solution is adopted. • When a server returns a page, it usually returns some information about the page • MIME type of the page • Pages of type text/html are just displayed directly • If MIME type is not of a built in type, then the browser consults an internal table with associations between MIME types and viewers • Two possibilities • Plug-ins – special modules that the browser loads in its memory space • Helper applications – separate process that takes as parameter the name of the file to display
Server side • Typical web server operations: • Accept TCP connection • Get the name of the file requested • Get the file (from disk) – this can be a lengthy operation, since every disk access takes in average 5ms (access time) + time to read the file (up the file length); • Return the file to the client • Release the TCP connection • Improvements • Maintain a cache with last “n” most accessed files • Multithreaded server
Server side • If too many requests come into one second, the CPU will not be able to manage the load, no matter how many disks are used in parallel • The solution is to add more nodes (computers), possible with replicated disks (server farms) • A front end still accepts incoming requests and “sprays” them over multiple CPUs rather than multiple threads • Individual machines may be multithreaded and pipelined as before
URL – Uniform Resource Locator • Some common URLs
Stateless and cookies • Web is stateless, no concept of login session; the browser sends a request to the server, and gets back a file; server forgets it seen that particular client • A quick solution would be to observe client’s IP addresses (not good since they could be NAT-ed) • Cookies (Netscape) solve this problem, by having the server supply additional information when a client requests a page; this info may include a cookie, that is a small (4KB) file or string • Contains up to five fields • When the browser sends a request to a page, it first checks to see if it has an associated cookie with the domain the request is going to. If yes, then it appends this cookie to the request; the server gets it and interpret it any way it wants
HTML – HyperText Markup Language • (a) HTML source code • Markup language containing explicit commands for formatting • (b) Formatted page
Hypertext Document Format • Encoded as text file • Contains tags to communicate with browser • Appearance • <h1> to start a level one heading • <p> to start a new paragraph • Links to other documents and content • <a href = . . . > • Insert images • <img src = . . . >