1 / 48

jaslin group SAN FRANCISCO jaslin@jaslin

jaslin group SAN FRANCISCO jaslin@jaslin.com. California Coalition of Nurse Practitioners. Data Mining on the Internet “finding it in Cyberspace”. James A. Sanders, CAE jaslin group SAN FRANCISCO. WHAT WE WANT TO LEARN TODAY. What is available on the Internet

spike
Download Presentation

jaslin group SAN FRANCISCO jaslin@jaslin

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. jaslin group SAN FRANCISCO jaslin@jaslin.com

  2. California Coalitionof Nurse Practitioners Data Mining on the Internet “finding it in Cyberspace” James A. Sanders, CAE jaslin group SAN FRANCISCO

  3. WHAT WE WANT TO LEARN TODAY • What is available on the Internet • What are the best tools to obtain valuable products and services • How to configure a search on a browser • What are the best search sites • Why should I care?

  4. ARE COMPUTERSMALE?orFEMALE?

  5. Five reasons to believe computers are female... 1. No one but the Creator understands their internal logic. 2. The native language they use to communicate with other computers is incomprehensible to everyone else. 3. The message "Bad command or file name" is about as informative as, "If you don't know why I'm mad at you, then I'm certainly not going to tell you." 4.Even your smallest mistakes are stored in long-term memory for later retrieval. 5. As soon as you make a commitment to one, you find yourself spending half your paycheck on accessories for it.

  6. 5 REASON TO BELIEVE COMPUTERS ARE MALE... 1. They have a lot of data, but are still clueless. 2. They are supposed to help you solve problems, but half the time they ARE the problem. 3. As soon as you commit to one you realize that, if you had waited a little longer, you could have obtained a better model. 4. In order to get their attention, you have to turn them on. 5. Big power surges knock them out for the rest of the night.

  7. WHAT QUESTIONS DO YOU WANT ANSWERED TODAY?

  8. WHAT YOU NEED TO GET ON THE INTERNET • COMPUTER • MODEM & PHONE LINE • SOFTWARE • ISP ACCOUNT • DOMAIN NAME • E-MAIL NAME • GOOD REFERENCE BOOKS • TIME!

  9. WHAT'S IT GOOD FOR…? • E-mail • Basic research • Limited marketing • Event registration • File downloads • News and information • Publishing • Resources & reference library • Listservers • Newsgroups

  10. WEB BROWSERS The Software you need for the “Web” NETSCAPE NAVIGATOR or MS INTERNET EXPLORER

  11. IP ADDRESS 205.158.47.110 Simple…huh? --UncleInterNIC Internet Network Information Center

  12. ANATOMY OF A URL(Uniform Resource Locator) disk dir on ISP | unique domain name | http://www.jaslin.com/security/index.html site type | | sub-domain | high-level domain | ISP file browser sees

  13. HIGH-LEVEL DOMAINS .com COMMERCIAL .edu EDUCATION .gov GOVERNMENT .net WEB HOST .mil MILITARY .org OTHER / NON-PROFIT .ukUnited Kingdom .fr France

  14. FIVE MAJOR GROUPS OF THE INTERNET • Newsgroups and discussion lists • files available by FTP • bulletin boards and other services accessible using the telnet command • services organized using gopher software; • material using WWWeb software

  15. MEDIA TYPES ON THE INTERNET • TEXT • IMAGES • AUDIO • VIDEO • PERSONAL COMMENTS • PROGRAMS

  16. Surfing is browsing without tools.

  17. SEARCHING THE INTERNET • Releases the true value of the Internet! • Basic research for information • Use SEARCH ENGINES to find info • Spiders index the web continuously • Boolean logic - OR, AND & NOT • Use multiple search engines • Advanced vs. standard search

  18. WHAT DO YOU WANT TO FIND? • A keyword • A person • A URL • A phrase • A geographic location • A concept • A title

  19. TYPES OF SEARCH ENGINES • Keyword • Alta Vista • HotBot • Webcrawler • Concept / hierarchical • Yahoo • Infoseek • Meta Search • Search.com

  20. COMPONENTS OF SEARCHES • Standard Boolean • Advanced Boolean • Proximity Searching • Required Terms • Prohibited Terms • Wildcards • Case Sensitivity

  21. SEARCH ENGINES VARY ACCORDING TO: • Size of the index • Frequency of updating the index • Search options • Speed of returning a result set • Result set presentation • Relevancy of the items included in a result set • Overall ease of use.

  22. Search “Rules of Thumb” • Enter precise search terms or phrases to limit the search. • Use the required / prohibited term operator; • Enter singular terms. Most search engines will find the substring to generalize a subject; • use wildcards where allowed; • Do not use common, generic search terms ... (book vs. "book binding) • Enter multiple spellings where appropriate... (Khaddafi Quadafy Kaddafi Qadaffi... ) • Use Booleans and especially proximity & adjacency operators to increase the relevancy; • Be persistent and creative. Its a big web out there!

  23. Searching for information on the Internet is more an ART than a SCIENCE. You should be prepared to spend time looking for something, and still come up empty.

  24. Some Popular Web Search Engines • Alta Vista • Yahoo • Lycos • WebCrawler • InfoSeek • MetaSearch • Dogpile • Northern Lights

  25. AltaVista is the premier search engine on the web. • It has the largest, most inclusive indices • allows searching of both the web and many Usenet Newsgroups • It provides both simple and advanced searches

  26. AaBbCc Case Sensitivity • Search terms entered in lower case letters are non-case sensitive. • Capitalized terms (or accented letters) makes the term case sensitive. • HotDogfinds only the terms spelled exactly with that capitalization • hotdog finds all occurrences

  27. Required / Excluded Words • Require a word - pre-pend it with a + symbol: +HotDog. • Exclude a word - pre-pend it with a - symbol: +"F. Scott Fitzgerald" -Gatsby. +Lincoln -automobile

  28. Wildcard Characters • The asterisk (*) is AltaVista's wildcard character. • butt* will get: • butt • butts • butter • button The asterisk cannot be used at the beginning or in the middle of words. It will substitute for up to 5 additional lower case letters.

  29. Confidence Rankings • AltaVista will assign a confidence ranking to the hits it returns based on the following: • The query terms are found in the first few words of the document (especially the title of web pages). • The query terms are found in close proximity to one another in the document. • The document contains more of the search terms than other documents.

  30. SEARCH SYNTAX EXAMPLES • horses AND carriages • "Abraham Lincoln" AND "civil war" • ("Abraham Lincoln") AND NOT ("civil war") (Note: Do NOT use x NOT y, it must be x AND NOT y.) • "Thomas Middleton" OR "Beaumont and Fletcher" • (dogs OR cats) AND ("pet care") • "William Shakespeare" NEAR internet • (illegal AND immigrant) AND NOT (Mexico) • alien OR ufo • alien AND NOT ufo • football AND (rugby OR soccer)

  31. PROXIMITY & ADJACENCY EXAMPLES • use NEAR/n, where n is the number of words apart the two search terms should be Shakespeare NEAR/5 Internet. • If a range is not entered, NEAR will return hits on documents where the words are next to each other, in either order. • For controlling the specific order two words must appear next to each other, you may use the ADJ operator: reverse ADJ osmosis.

  32. Yahoo is not a search engine, but strictly a hierarchically arranged subject index. • It has developed over a long time, with lots of editorial care, so the quality is very high. • Browsing Yahoo is the best way to surf for good sites when you don't know (or perhaps care) where exactly you are going. • It is also the best way to find good 'starter' sites, from which you can branch out to more specialized ones.

  33. YAHOO RETURNS 3-TYPES INFO • Yahoo categories that match the search term • Actual matching end-sites • The Yahoo categories from which the various pages are indexed

  34. IN YAHOO USER CAN CONTROL • Though you cannot create very sophisticated searches as with the search engines, you can control: • where to search - Usenet or Email • whether to OR or AND the search terms • search on substrings (find whole words from partial strings • number of matches per page

  35. METASEARCH ENGINES Search engine of search engines search.com Dogpile The Internet Sleuth

  36. BOTS AND INTELLIGENT AGENTS • Intelligent agents are software entities that assist people and act on their behalf • Strictly speaking, all bots are "autonomous" ­ able to react to their environments and make decisions without prompting • Bona fide bots are programs with personality

  37. When looking for people, you will usually be looking for one of the major information pieces: • Address • Phone number • E-mail address • Personal information

  38. TODAY WE COVERED... • What is available on the Internet • What are the best tools to obtain valuable products and services • How to configure a search on a browser • What are the best search sites • Why should I care?

  39. YOU ARE NOW…. MASTER OF CYBERSPACE!

  40. Data Mining on the Internet “finding it in Cyberspace”

  41. "I think there's a world market for about 5 computers.”Thomas J. Watson, Chairman of the Board, IBM (around 1948)

  42. “There is no reason anyone would want a computer in their home.”Ken Olson, president, chairman and founder of Digital Equipment Corp., 1977

  43. jaslin group SAN FRANCISCO jaslin@jaslin.com

More Related