300 likes | 398 Views
Computer Science 1000. Information Searching I. Permission to redistribute these slides is strictly prohibited without permission. World Wide Web – The Basics our next topic examines how to find information on the web we consider a few basic terms here (which you’re probably familiar with):
E N D
Computer Science 1000 Information Searching I Permission to redistribute these slides is strictly prohibited without permission
World Wide Web – The Basics • our next topic examines how to find information on the web • we consider a few basic terms here (which you’re probably familiar with): • page/web page • link/hyperlink • site/web site • later in semester, we will revisit web technologies in much more detail
World Wide Web • a system of linked documents accessed via the internet • often simply referred to as the web • sometimes used interchangeably with the internet, but this isn’t exactly correct • the internet is the global network of interconnected devices (computers, routers, etc) that exchange data • the web refers to the documents being stored, the software that broadcasts and receives them, and the protocols used for transmission
Web Page • a document stored and accessed on the web • identified by a unique URL (Uniform Resource Locator) • often referred to simply as a page • today’s web pages are very rich in content • text • images • hyperlinks • videos
Web Site • a collection of related webpages on the internet • typically belong to a common organization or event • example • all pages served by the University of Lethbridge make up its website
Hyperlink • a part of a web page that refers to a different location • often just called a link • hyperlinks can reference: • another place on the same page • another webpage • hypertext: text containing hyperlinks
The Age of Information • the computer, internet, and web have changed how we interact with information • information storage • the amount of available information is significantly greater (and growing rapidly) than even a generation ago • information transmission • large amounts of information are available with a single mouse click, and transfer almost immediately
Information Age – Rapid Onset • the situation has transformed tremendously in your lifetimes • consider the global information capacity: • in 1986: 2.6 exabytes (< 1 CD per person) • in 1993: 15.8 exabytes • in 2000: 54.5 exabytes • in 2007: 295 exabytes (61 CDs per person) • how does one successfully navigate such a mountain of digital content? Martin and Lopez. The World’s Technological Capacity to Store, Communicate, and Compute Information. Science 332:6025 2011
Information Access • even in pre-internet days, there was a wealth of information • large-scale: library • medium-scale: Encyclopaedia set • small-scale: newspaper • strategies developed to manage information • categories • hierarchies • indices
Classification • systematic arrangement in groups or categories according to established criteria – Merriam Webster • in other words, the information is categorized according to relevant features • consider our course notes: • terminology (4 sets of slides) • information searching (2-3 sets of slides) • etc ...
Classification • classification is not specific to digital information • library classification: Library of Congress Classification Dewey Decimal Classification
Classification • classification is not specific to digital information • newspaper classification
Classification • classification level of detail leads to tradeoffs • consider a coarse level of detail • e.g. taxonomy of living organisms • classify organisms according to Domain (Archaea, Bacteria, Eukarya) • advantage: small number of groups • disadvantage: each group is massive
Classification • classification level of detail leads to tradeoffs • consider a fine level of detail • e.g. taxonomy of living organisms • classify organisms according to Genus (Canis, Felis) • advantage: each group reasonably small • disadvantage: massive number of groups • solution: hierarchy
Hierarchy • a decomposition of classifications according to detail • hierarchies contain levels • at the top (root) level, there is typically a small number of broad categories • each category is decomposed into small categories • a classification group is defined by categorization at each level
Hierarchy • organism taxonomy hierarchy: • each Domain categorized into Kingdoms Eukarya Domain: Kingdom: Protista Animalia Fungi Plantae
Hierarchy • organism taxonomy hierarchy: • each Kingdom classified in Phylum • each Phylum classified into Class • and so on .. http://ag.arizona.edu/pubs/garden/mg/entomology/intro.html
Hierarchy • an object is still categorized, but by multiple levels (instead of one) http://schoolworkhelper.net/scientific-taxonomy/
Hierarchy • facilitates efficient searching through exclusion • example (text): • suppose you have a collection of a million items • these items organized into 10 equal-sized groups • each top-level group is also organized into 10 equal subgroups • choosing first category eliminates 900000 items • choosing second category eliminates 90000 items • and so on …
Hierarchy • hierarchies are very popular • consider our previous examples: • Library of Congress Classification
Hierarchy • hierarchies are very popular • consider our previous examples: • Newspaper
Index • a detailed list of words, phrases, and/or topics indicating place of occurrence • in essence, it maps keywords of interest to their location • e.g. a page number • a bottom-up approach to information organization • as opposed to the top-down structure of a hierarchy • particularly popular in printed material • books, magazines, volumes, etc
Index • typically used on small-scale • books and volumes vs. libraries • made efficient through organizational scheme • alphabetical is very common • some overlap with hierarchies • e.g. subtopics
Finding Information – The Web • as discussed, the amount of information on the web is immense • many of the discussed techniques for information finding also apply digitally • classification/hierarchies • indexing
Classification • many commercial websites have a classification structure • navigation bars
Hierarchies • many websites, especially large ones, will also arrange their categories in hierarchical fashion
Partition • a hierarchy where every object occurs only once • organism taxonomy – every species appears only once • some hierarchies are necessarily partitions • e.g. a particular book will only occur at one point in a library classification • however, a partition in some case is not natural • an object might have an inherent fit in more than one classification
Partitions • digital content is often stored using overlapping hierarchies (non-partition) • potentially more intuitive • with hyperlinking, it’s easy to accomplish (two links to the same page) • example (text): • Three Books for Frugal Fashionistas was stored on NPR’s website under: • Home > Arts & Life > Books > Three Books for Frugal Fashionistas • Home > Listen > Latest Program > Three Books for Frugal Fashionistas
Indexes for the Web • unlike hierarchies, indexes are much less common on individual websites • site maps might be considered an index of sorts • however, there are analogous technologies to indexes that pertain to the web as a whole • Search Engines!