300 likes | 429 Views
An Overview of Artificial Intelligence. TOPICS. AI Background. How did we get here and Why?. Natural Language Processing (NLP). How do you deal with Symbolic Representations?. Neural Networks. How can machines be made to emulate humans?. Robot from the 1921 play "R.U.R.". AI Background.
E N D
An Overview of Artificial Intelligence
TOPICS • AI Background • How did we get here and Why? • Natural Language Processing (NLP) • How do you deal with Symbolic Representations? • Neural Networks • How can machines be made to emulate humans?
Robot from the 1921 play "R.U.R." AI Background • People have alwaysbeen fascinatedwith giving machines human Abilities • C. 270 BC: An Greek engineer named Ctesibus made organs and water clocks with movable figures. • Jacques de Vaucanson (1709 -1782) created a mechanical duck that ate and drank with realistic motions of head and throat, produced the sound of quacking, and could pick up cornmeal and swallow, digest, and excrete it. • Mary Shelley’s Book Frankenstein (1818) • 1921: R.U.R. (Rossum's Universal Robots): A Play by Karel Capek • “Robot" comes from the Czech word "robota" (forced labor) • The Movie Frankenstein (1931) • Science fiction writer Isaac Asimov first used the word "robotics" to describe the technology of robots and predicted the rise of a powerful robot industry (1941)
AI Background • People have always tried (unsuccessfully) to figure out how the brain works • McCulloch and Pitts (1943) developed a (workable) mathematical model of brain (networks of neurons) functioning (Binary, Since firing is an ‘all-or-none’ process) • Influenced John von Neumann (1945: Stored Programs) • Led to the use of Neural Networks (discussed later) • Encouraged the development of Perceptrons (Learning Systems) • Turing Test (1950) • Newell and Simon (1954) • conceived of using computer programming language to build theories of human symbolic behavior • showed how a wide range of cognitive processes in problem solving and problem understanding can be explained in information-processing terms and modeled with computer programs.
AI Background • Arthur Samuel’s Checker Program (1955) • First ‘Learning’ Program • Performed a look-ahead search from each current position • Saved a description of each board position encountered during play together with its backed-up value determined by the minimax procedure • “If the program is now faced with a choice of board positions whose scores differ only by the ply number, it will automatically make the most advantageous choice, choosing a low-ply alternative if winning and a high-ply alternative if losing" (Samuel, 1959, p. 80). • Dartmouth Workshop (1956) • Introduction of the term AI • First conference on robotics • LISP (1958) • The first programming language dedicated to AI
AI Background • Dendral (1965) • First (?) Expert System • Chemical analysis of organic compounds using mass spectroscopy • Shakey the Robot (1970) • The first mobile robot using AI Programming • MYCIN (1975) • Once MYCIN determines the most likely cause of infection and accounted for the patient's allergies, it will suggest a course of medication • Uses rules like, 'If the infection is primary bacteriemia, and the site of the culture is one of the sterile sites, and the suspected portal of entry of the organism is the gastrointestinal tract, then there is suggestive evidence that the identity of the organism is bacteriodes." • Because Physician’s Distrusted MYCIN, it was the first ES to provide explanations
AI Background • LISP Machines (LISPM) (C. 1980) • A computer which has been optimized to run lisp efficiently and provide a good environment for programming in it • 1985: Over 100 US Companies offered AI Oriented Technologies for sale • In 1986-87 the demand in AI systems decreased, and the industry lost almost a half of a billion dollars ?? Why the Change ??? • The lack of Application vs. Theory • 1991: Desert Storm • AI-based technologies were used in missile systems, heads-up-displays, and other advancements. • AI once again becomes a “Hot Topic”
A ‘Typical’ Computer A ‘Typical’ Human AI Background ?? What are Computers Better at than Humans ??? • Fast Calculations • Fast Recall • Short-Term Memory (RAM) • Long-Term Memory • Sequential Processing • Massive Parallelism • Ah ….. Fast Calculations • Fault Tolerance • Ah ….. Fast Calculations • Dealing with Ambiguity • Ah ….. Fast Calculations • Adapting to Circumstances • Ah ….. Fast Calculations • Creativity • Ah ….. Fast Calculations • Learning • Ah ….. Fast Calculations • Associations • Ah ….. Fast Calculations • Procreating -- Alright – That’s pushing it!! You Win!! Human’s are Superior to Computers !!
Natural Language Processing (NLP) • Symbolic Manipulation • Uses (Existing and Future): • Information Retrieval (IR) • Internet/Automated Search Engines/Web-Crawlers • Document Classification • Word-Processing Assistance (WP “Wizards”) • Expert Systems • Indexing (Textbook) • Keyword Classification • E-Mail Routing • Extensions: • Voice Response • Voice Recognition
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Suppose you wish to get general information about E-Commerce Articles Retrieved Retrieve all articles having the Key Word “E-Commerce” WSJ E-Commerce Stocks Down This Week ~~~~~~~~~~~~ ~~~~~~~~~~~~ MISQ E-Commerce Strategies ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ NewsWeek E-Commerce: Who’s Using it ~~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~ Elle Buying Clothes at E-Commerce Sites ~~~~~~~~~~~~ ~~~~~~~~~~~~ CACM Designing E-Commerce Webs ~~~~~~~~~~~~ ~~~~~~~~~~~~ Where: Useful Articles Unrelated Articles
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Assume we have There are a total of 50 Documents • Of those, assume only 11 are relevant • Recall: The Percentage of Relevant articles found Where are we now?? About 3 of 11 (27%) Available Articles Retrieved
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Assume we have There are a total of 50 Documents • Of those, only 11 are relevant • Recall: The Percentage of relevant articles found • Precision: The Percentage of Useful articles found Where are we now?? About 3 of 9 (33%) Articles retrieved are relevant
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems ?? How do Internet Search Engines Retrieve Documents ??? • “Bag of words” Approach • Count of Simple occurrence frequencies (For listing order) • No attention paid to inter-word relationships • No attempt made to characterize documents • Problems: • Words are ambiguous • Words are used in different forms • Words are used synonymously ?? WHY ??? Can’t the process be improved ??? -- Stay Tuned --
vs. Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: ~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~ Where are we now?? Over 95% Accuracy
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): ~~~~~~~ John Smith ~~~~~~~ Olusegun Obasanjo ~~~~~~~~~~ The President of Nigeria ~~~~~~~~~~~~ ~~ Mr. Obasanjo ~~~~~ Where are we now?? About 85% Accuracy
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • The bridge of one’s nose • The bridge of a pair of glasses • The bridge over a river • The bridge of a ship • A dental bridge • A guitar bridge • A game of bridge
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Need to disambiguate relative to: • Hand-constructed Senses (heuristics) • English Dictionaries • Bilingual Dictionaries • Thesauruses
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • There is a need to parse terms/phrases to reduce searches • The Information Systems are used …. → Information System • The Information Systems → Information System • Information Systems → Information System • The problem is How to do it
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words can readily be eliminated (Stop Words) • a • be • not • the • was • were • an • is • or • to • This can sometimes be problematic: • Search for “IS” (the common initialization for Information Systems) • Search for the phrase “to be or not to be” (from Hamlet)
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • prefix Removal: megavolt volt • infix Removal: un-bloody-likely unlikely • Still Problematic: • Isn’t megavolt a relevant search term? • Does un-bloody-likely need additional parsing?
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “ies” but not “eies”, “aies” “y” • Queries Query • Hierarchies Hierarchy • Berries Berry • Glossaries Glossary • BUT, what about: • Series Sery ???
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “es” but not “aes”, “ees”, “oes” “e” • Loves Love • Mandates Mandate • Cares Care • Envelopes Envelope • BUT, what about: • Cactuses Cactuse ???
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • If a word ends in “s” but not “us”, “ss” “” (eliminate) • Wants Want • Bananas Banana • Walks Walk • Maniacs Maniac • BUT, what about: • Has Ha ???
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Identity Recognition: • Associations (Identifying co-referential items): • Ambiguous Terminology: • Non-contributory Terminology • Some words/phrases can readily be eliminated (Stop Words) • Prefix/Infix Removal (Stemming) • Suffix Removal (Stemming) • Additional Considerations: • Words ending in “ed”, “ing”, “ational”, “ation”, “able”, “ism”, etc. • Additional Problems: • Bed B ? Be ? • Able “” ? • Prism Pr ? • Fling Fl ?
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • The goal is to be able to classify any document: Music Sports Business Mud Slinging • Although it is too often: Other
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • Simple Classifications (“Bag of Words” - Webcrawlers) • Documents are scanned for words/phrases • Lists of most frequently occurring words/phrases are maintained • Problems: • Massive Lists needed • Very Slow: How many websites and documents at each site are there? How often are new sites added? How often are documents added to existing sites? How long to determine frequencies? • Spamming: Looking for an article on Jennifer Aniston?? SEX SEXY MONICA LEWINSKY JENNIFER LOPEZ CLAUDIA SCHIFFER CINDY CRAWFORD JENNIFER ANISTON GILLIAN ANDERSON MADONNA NIKI TAYLOR SEXY SEXY ELLE MACPHERSON KATE MOSS CAROL ALT TYRA BANKS FREDERIQUE KATHY IRELAND PAM ANDERSON KAREN MULDER VALERIA MAZZA SHALOM HARLOW AMBER VALLETTA LAETITA CASTA SEXY SEXY BETTIE PAGE HEIDI KLUM PATRICIA FORD DAISY FUENTES KELLY BROOK SEX SEXY MONICA LEWINSKY ……..
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems • Another approach is to analyze the frequency of words/terms for a given document occurring in each category • Music Words • Tempo • Symphony • Volume • ••••• • Word Counts • Organization (24) • Profit (16) • Volume (12) • ••••• • Business Words • Profit • Stock Value • Assets • ••••• 6 Matches 42 Matches • Sports Words • Baseball Game • Points scored • Teams • ••••• • M-Sling Words • So’s-your-old-man • You Stink • Liar, Liar • ••••• 12 Matches 22 Matches Business Document Problem: Establishing Category Lists
Natural Language Processing (NLP) • Underlying Problems • Document Retrieval Problems • Document Analysis Problems • Document Classification Problems ?? Is it worth it ??? • Research shows it is: • Hull, D.A. (1996): Stemming algorithms: A case study for detailed evaluation, in Journal of the American Society for Information Science, 47(1): 70-84 • Web Search engines almost never use it ?? WHY ??? • Time • Lack of Consistency (So Far) • Complexity • User Expectations • Cost • Foreign Language Usage
????????????? Any Questions (Please !!!) ?????????????