240 likes | 363 Views
NEVER-ENDING LANGUAGE LEARNER. Student: Nguyễn Hữu Thành Phạm Xuân Khoái Vũ Mạnh Cầm Instructor: PhD Lê Hồng Phương. Hà Nội , January 11 2014. Idea: Build a structuring KB. What is KB? Categories: cities, companies, sport teams…. Relations: hasOfficeIn ( organisation , location)
E N D
NEVER-ENDING LANGUAGE LEARNER Student: NguyễnHữuThành PhạmXuânKhoái VũMạnhCầm Instructor: PhD LêHồngPhương HàNội, January 11 2014
Idea: Build a structuring KB. • What is KB? • Categories: cities, companies, sport teams…. • Relations: hasOfficeIn(organisation, location) • Noun Phrase • What is structuring KB?
Idea: Structuring Knowledge Base football uses equipment climbing skates helmet Canada Sunnybrook Miller uses equipment city company hospital Wilson country hockey Detroit GM politician CFRB radio Pearson Toronto play hired hometown airport competeswith home town StanleyCup Maple Leafs city company Red Wings city stadium won won Toyota team stadium Connaught city paper league league acquired city stadium NHL Maple Leaf Gardens member Hino created plays in economic sector Globe and Mail Sundin Prius writer automobile Toskala Skydome Corrola Milson
Ideas: using Machine Learning • Machine Learning: a branch of artificial intelligence, concerns the construction and study of systems that can learn from data.
Seed examples Initial ontology Knowledge Base (KB) Ideas NELL Web Human trainers
Ideas: the task • run 24x7, forever • each day: • Reading task: extract more facts from the web to populate the initial ontology. • Learning task: learn to read (perform #1) better than yesterday.
Knowledge Base Knowledge Integrator Data Resources Beliefs NELL Architecture Candidate facts 1 2 CSEAL CPL CMC RL 3 Subsystem Components
Coupled Pattern Learner (CPL) Learns to extract category and relation instances/ pattern from unstructure text. Learns contextual pattern that high-precision extractor for each predicate. Eg: + Trang An la ten mot co gai. + Trang An la ten mot cong ty. Use it to improve high-precision
Input/Output - Input : + Larger text corpus + Initial ontology containing the information. Output: + Proposed instances/ contextual pattern for each predicate.
Input: An ontology O, and a text corpus C Output: Trusted instances/patterns for each predicate for i=1,2,...,∞ do foreach predicate p in O do EXTRACTcandidate instances/contextual patterns using recently promoted patterns/instances; FILTERcandidates that violate coupling; RANKcandidate instances/patterns; PROMOTEtop candidates; end end
Example:Samsung vừa tung clip chế nhạo sản phẩm mới của Nokia.
New candidate facts Coupled SEAL Beliefs CSEAL Internet
Coupled SEAL • SEAL (Set Expander for Any Language): expands entities automatically by utilizing resources from the Web • CSEAL adds mutual-exclusion and type-checking constraints
Coupled SEAL • Coupled SEAL :: A semi-structured extractor • Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances • Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables • 5 queries/category 10 queries/relation fetches 50 web pages/query • probabilities assigned as in CPL
Coupled SEAL • Example:
KB New candidate facts Coupled Morphological Classifier CMC Data Resources CMC classify NP based on various morphological features (words, capitalization, affixes)
Coupled Morphological Classifier • Ex1: Bach Mai hotel hotel(Bach Mai) • Ex2: Mai person(Mai) • Ex3: tradition noun(tradition)
Coupled Morphological Classifier • Beliefs from KB are used as training instances • CMC examines candidate facts proposed by other components and classifies up to 30 new beliefs/candidate
Candidate facts New candidate facts Rule Learner RL Beliefs RL uses categories and relations in KB as its input and make new relations for KB.
Rule Learner • Example 1: playSport(Rooney, football) athlete(Rooney), sport(football) • Example2: isCapital(Hanoi, Vietnam), liveIn(Thanh, Hanoi), roommate(Thanh, Khoai), roommate(Khoai, Cam) liveIn(Thanh, Vietnam), roommate(Thanh, Cam), liveIn(Khoai, Hanoi)…..
Rule Learner • Some kinds of Rule Learner Systems: OneR, Ridor, PART, JRip, ConjunctiveRule. • Clip: https://www.youtube.com/watch?v=5On-tDeu2ic
Initial result • Running 24x7, since January, 12, 2010 • Inputs: • ontology defining >600 categories and relations • 10-20 seed examples of each • 100,000 web search queries per day • ~ 5 minutes/day of human guidance • Result: • KB with > 15 million candidate beliefs, growing daily • learning to reason, as well as read • automatically extending its ontology
Initial result • Demo: • http://rtw.ml.cmu.edu/rtw/kbbrowser/beverage:beer
References • NELL article: http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf • http://rtw.ml.cmu.edu/rtw/kbbrowser/beverage:beer • http://videolectures.net/akbcwekex2012_mitchell_language_learning/ • Tom Mitchell’s seminar: http://www.youtube.com/watch?v=51q2IajH94A • RL: http://mydatamining.wordpress.com/2008/04/14/rule-learner-or-rule-induction/