650 likes | 775 Views
Introduction to Web Science. Web 1.0. Introducing Web 1.0. Packet switching network IP Addressing Internet Applications The WWW and markup Searching the WWW Intelligent Agents Internet Governance. Packet-Switched Networks (1). Local area network (LAN)
E N D
Introduction to Web Science Web 1.0
Introducing Web 1.0 • Packet switching network • IP Addressing • Internet Applications • The WWW and markup • Searching the WWW • Intelligent Agents • Internet Governance
Packet-Switched Networks (1) • Local area network (LAN) • Network of computers located close together • Wide area networks (WANs) • Networks of computers connected over greater distances • Circuit • Combination of telephone lines and closed switches that connect them to each other
Packet-Switched Networks (2) • Circuit switching is used in telephone communication • The Internet uses packet switching • Packet switching needs computers called ‘routers’ and the programs called ‘routing algorithms’
Packet-Switched Networks (3) • Information is divided into packets • It is passed from node to node • It is recomposed as one chunk on the destination server
Routing Packets • Routing computers • Computers that decide how best to forward packets • Routing algorithms • Rules contained in programs on router computers that determine the best path on which to send packets • Programs apply their routing algorithms to information they have stored in routing tables
TCP/IP • Communications protocol suite • Packet switched protocol • No end-to-end connection is required • Each message broken down into small pieces called packets • Packets possibly routed to destination over different paths • Transmission Control Protocol (TCP) • Breaks messages into packets • Numbers packets in order • Reorders packets at the destination • Internet Protocol (IP) • Routes packets to the proper destination
Open Systems Interconnections Model OSI Model (also called TCP/IP protocol suite) layers (from the highest to the lowest):
IP Address • Internet addresses are based on a 32-bit number called an IP address • IP addresses appear as a series of up to four separate numbers delineated by a period • An address such as 126.204.89.56 uniquely identifies a computer connected to the Internet • IP Subnettingconceptually divides a large network into smaller sub-networks
Without subnetting … • Explosion in size of IP routing tables. • Every time more address space was needed, the administrator would have to apply for a new block of addresses. • Any changes to the internal structure of a company's network would potentially affect devices and sites outside the organization. • Keeping track of all those different Class C networks would be a bit of a headache in its own right.
Benefits of Subnetting • Better Match to Physical Network Structure • Flexibility • Invisibility To Public Internet • No Need To Request New IP Addresses • No Routing Table Entry Proliferation
IP Vr6 (or IP Next Generation) • Network Layer • Developed in 1994 • Will replace the IP Vr4 standard • limits on network addresses will eventually lead to exhaustion of available addresses (by 2023) • supports only 4,294,967,296 addresses (32bits) • Improvements include • providing future cell phones and mobile devices their own unique & permanent addresses • supports about 3.4 × 1038 (128bits)
Domain Names • A Uniform Resource Locator (URL) consists of names and abbreviations that are much easier to remember than IP addresses • The HTTP protocol defines how an Internet resource is accessed • An address such as www.microsoft.com is called a domain name • Domain Name System (DNS) • A database of Internet names • DNS Servers convert Internet names to IP addresses • Top level domains
Top-Level Domain Names • Internet Corporation for Assigned Names and Numbers (ICANN) • Responsible for managing domain names and coordinating them with IP address registrars
Domain Name case study • The web was not an ‘open’ place • One company available where you could buy a .com, .net or .org domain • Price of 100 dollars and a two year minimum • Back then, there was a big chance you would be able to buy a dictionary word as .com • In 2000, they lost the monopoly position and domain prices dropped over 95% • Since then innovation halted and Network Solutions became one of the thousands anonymous domain registrars
Internet Applications • E-Mail • File transfers • Instant messaging (IM) • Newsgroups • Streaming audio and video • Internet telephony • World Wide Web (WWW)
E-Mail • Most popular and widely used Internet application • 30 billion e-mails sent every day • Spam – junk e-mail messages • Spam costs corporate America $9 billion per year • Every e-mail message contains head that describes source and destination for the message • E-mail messages are text, but may have attachments of many types of digital data • Viruses often transmitted via e-mail
SMTP, POP, and IMAP (1) • E-mail is sent across the Internet is managed and stored by mail servers • Simple Mail Transfer Protocol (SMTP) is the standard to send mails to the server • Post Office Protocol (POP) is the standard to get mails from the server • The Interactive Mail Access Protocol (IMAP) is a newer e-mail protocol
Controlling Spam • Use complex email addresses rather than name and surname combination • Why? Bots? Name Directories? • Control exposure of email address • How? Java script? JPEG? • Use multiple email addresses for different purposes • In what occasions? • Use content-filtering software • black list spam filter • white list spam filter • challenge response using graphical challenges ?
E-Mail Case Study • Hotmail (1995) • First place to get a free email address, disconnected from an ISP • 4 years later, 30 million people worldwide were exchanging @hotmail email addresses • Bought by Microsoft in 1998 for just 400 million dollars • 2007 the end of Hotmail • transformation to “Live” mail to become an integrated part of the Microsoft’s “Live” family
File Transfers • File transfer protocol (FTP) • Protocol providing for transmission of a file between an Internet server and a user’s computer • Peer-to-peer (P2P) file sharing • Share data from one computer to another • Every user can be a server • Napster • Kazaa • Gnutella • Torrent • With P2P, every user on the network can make data available to every other user on the network
Instant Messaging • Allows user to create a private chat session with another user • IM started with AOL • IM sneaking into corporate networks • Many Web-based companies use IM technology for customer service • eBay
ICQ case study • ICQ abbreviation of “I seek you” • 1996 first easy to use instant messenger program where you could add friends to your list, and see if they were online • Back then it was revolutionary for the masses and it became the ‘application’ everybody had installed • Acquired by AOL in June 1998 for a whopping $287 million • Eventually the program got too many additional features that made the application heavy and unorganized • Competition of AOL IM, Yahoo IM, and MSN Messenger increased, and friends on your ICQ-list left the application eventually resulting in a mass abandoning of the network
Usenet Newsgroups • Online, bulletin board discussion forums • Users post and read messages • More than 100,000 newsgroups • Millions of newsgroup readers • Important information resource, especially for technical issues and products • Newsgroup messages distributed using open standard • Many are uncensored
Streaming Audio and Video • Creating and sending audio and video files • Sports • Basketball at sports.yahoo.com • Major league baseball • News • Fox News • CNN radio • Business • ZDNet • Education • Warriors of the Net
Internet Telephony • Voice-over Internet Protocol (VoIP) • Use your computer like a telephone • Software connects computers via the Internet and transmits voice data • Savings comes from eliminating toll charges between locations
The World Wide Web • Collection of hyperlinked computer files on the Internet • Client-server application • Web servers • Web browsers as clients • WWW standards • Hypertext markup language (HTML) • Current standard for writing Web pages • Tags in HTML instruct the client browser how to format and display the Web page content • Hypertext transfer protocol (HTTP) • Establishes a connection between Web server and client • Extensible markup language (XML) • A meta-markup language • Gives meaning to the data enclosed within XML tags
Website case study • Create your own free homepage on the web • 1997 Fifth most popular website, with over 500,000 homepages created • Yahoo bought Geocities two years later for $3.57 billion dollars and started to actively commercialize the homepages with various advertising types that resulted in their death sentence • ‘Real’ web hosting becoming affordable for anybody, the need for free homepages in this form vanished
Overview of Markup Languages • SGML is a rich meta language that is useful for defining markup languages • HTML is particularly useful for displaying Web pages • XML defines data structures for electronic commerce (and much more …)
http://www.w3.org/ Development of Markup Languages
Standard Generalized Markup Language • The ISO adopted SGML standard in 1986 • SGML is nonproprietary and platform-independent • SGML supports user-defined tags and architecture to complement the required richness of documents
Extensible Markup Language • XML is a descendant of SGML • XML allows designers to easily describe and deliver structured data from any application in a standard, consistent way • XML can be embedded within an HTML document • XML allows you to create your own customized markup language.
Learn XML in a slide • Tag – a piece of Markup • An opening tag <name> • A closing tag </name> • Element – well formed usage of tags • <name>Alexiei</name> • Attribute – properties • <name length=“7”>Alexiei</name> • Rules to keep XML well formed • Can be nested but not overlapping • Case sensitivity • Quoted attributes • Required end tag • Short hand • <abc></abc> is equivalent to <abc/>
Some XML examples <book>E-Commerce</booK> <book pages=100>E-Commerce</book> <book pages=“100”><title>E-Commerce</book></title> <book pages=“100”><title>E-Commerce</title></book> <book pages=“100”> <title>E-Commerce</title> <author> <name>Gary</name> <surname>Schneider</surname> </author> </book>
Some XML examples <book>E-Commerce</booK> <book pages=100>E-Commerce</book> <book pages=“100”><title>E-Commerce</book></title> <book pages=“100”><title>E-Commerce</title></book> <book pages=“100”> <title>E-Commerce</title> <author> <name>Gary</name> <surname>Schneider</surname> </author> </book>
Processing a Request for an XML Page • Why going through all this hassle? • How would you go about displaying HTML on a • PC • Handheld • Mobile
Hypertext Markup Language • Tim Berners-Lee invented HTML • HTML is a document production language that includes a set of tags that define the format and style of a document • HTML is based on SGML • HTML is an instance of one particular SGML document type – Document Type Definition (DTD)
HTML Tags • An HTML document contains both document content and tags • The tags are the HTML codes inserted in a document to specify the format on screen • Each tag is enclosed in brackets (< >) • Most tags are two-sided – opening and closing tags • Well formed tags, bots, meta tags?? Why are they important?
HTML Links • Hyperlinks are bits of text that connect the current document to: • Another location in the same document • Another document on the same host machine • Another document on the Internet • Can they link to a toaster at home? • Hyperlinks are created using the HTML anchor tag • Two popular link structures: • Linear hyperlink structure • Hierarchical hyperlink structure
HTML Version History • HTML version 1.0 was introduced in 1991 • HTML 2.0 was released in Sept. 1995 • HTML 3.2 was introduced in 1997 • HTML 4.0 was released by W3C in Dec 1997 • HTML 4.01 was released in Dec 1999 • XHTML 1.0 became a W3C recommendation in Jan 2000
HTML Editors (1) • Low end editor displays HTML code on the screen and allow you to insert HTML tag pairs by clicking selected buttons • High end editor are Web site builder programs, they provide a rich environment that displays the Web page, not the HTML code • Microsoft FrontPage and Macromedia Dreamweaver are examples of Web site builders
Static versus Dynamic Pages • HTML and XML only display and exchange data • No interactivity; no processing of data • Scripting languages • Provides basic interactivity • Rollovers • Crawling text • JavaScript • VBScript • Full-featured Web programming • Java • Client side scripting or browser side scripting • Applets • J2EE • Common Gateway Interface (CGI) • Allows passing of data between a static HTML page and a computer program