1 / 72

CONCEPT MODELING:

This research review explores concept modeling as a way of organizing and understanding information in a machine-processable manner. It discusses the challenges of creating uniform and efficient structures and bridging the gap between natural language and machine-processable models. The review also examines the use of patents and conceptual indexing in concept modeling.

Download Presentation

CONCEPT MODELING:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CONCEPT MODELING: A Research Review Đorđe Popović, Ognjen Šćekić,Veljko Milutinović IPSI Belgrade for SUN Microsystems January – December 2006.

  2. What Is Concept Modeling? • A way of modeling reality: • Identifying concepts • Identifying relations among concepts • Organizing the concepts in a knowledge-base, allowing an "intelligent" way to search and process this data. • Why do we need concept modeling?To make electronic resources not only machine-processable, but also machine-understandable!

  3. Challenges • How to create a model that has a uniform structure, and is powerful enough to capture the essence of any concept? • How should these models be linked into an efficient structure? • How can we bridge the gap between natural languageand a machine-processable model?

  4. 7 Ws – PROs and CONs Which When What Ultimate Goal: From a specific to a general model! concept Why WHAT associations provide general facts about anyconcept. Where Who (W)How

  5. Why Start with Patents? • Described by a very formal, structured language – claims. • Each patent is a novel concept. • Definition of one patent is usually based on another one.

  6. Structure of a Patent Document General info about the patent (can be used for Which, When, Why, Where, Who and How) Description – not well structured References to related patents Claims – primary target for What Abstract of the patent

  7. Conceptual Indexing (1) • What is conceptual indexing? “New technique for organizing information to support subsequent access that can dramatically improve your ability to find the information you need,with less hassle and with better results.” William A. Woods • Conceptual indexing combines techniques of: • Knowledge representation • Natural language processing • Classical techniques for indexing words and phrases • Bridges the gap between natural languageand a machine processable model.

  8. Conceptual Indexing (2) Conceptual indexing technology is a combination of: • Concept extractor Identifies phrases to be indexed. • Concept assimilator Analyzes a concept phrase to determine its place in the conceptual taxonomy. • Conceptual retrieval system Uses conceptual taxonomy to make connections between requested and indexed items. Figure 1 – Main components of a conceptual indexer

  9. Hybrid Approach: Indices + RDF/OWL • Conceptual indices • RDF/OWL • Motivation:Use the advantages of one approach to eliminate the drawbacks of the other.

  10. Conceptual Indices vs. RDF/OWL

  11. Why not Use Ontologies Alone? • If we want to use an ontology we have 2 choices: • Use an existing, well-established ontology that might not suite our needs. • Create a new ontology which does suit our needs: • We can create several different ontologies,depending on how we want to capture the information. • Problems arise when we want to merge ontologies. • This approach works fine within a closed communitywith specific needs: • There already exists a well-defined basic ontology structure. • Community members have a good knowledge of how to model new conceptsin terms of the existing ones.

  12. Why not Use Indices Alone? • For example, let us take the simplest possible definition, for a bird: bird 1 – a creature with wings and feathers that lays eggs and can usually fly. • Our index might then contain the following associations:creature, wings, feathers, eggs, fly. • A conceptual index does not offer the possibility to state the fact that some birds do not fly! 1 - Word definition taken from Longman Dictionary of Contemporary English, 3rd edition, 1995.

  13. Hybrid Approach (1) • An index of associations represents a simple model,similar to what humans have on their mindwhen they first think of a bird. • Having enough associations, one can create a model with a considerable degree of accuracy. • RDF/OWL statements provide a means for expressing additional (but very important) information(e.g. there are birds that cannot fly!) • We believe this is good enough for most applications.

  14. Hybrid Approach (2) • It is important to keep track of how many times a term is mentioned,because it affects its descriptive power. • Example: U.S. Patent 6,989,179 – “Synthetic grass sport surfaces”, claims section: 1. synthetic grass [10] 2. playing surface [9] … • These terms represent the essence of what is being described!

  15. Hybrid Approach (3) • However, this is only because we know what “synthetic grass” and “playing surface” are!  At some level, we need to have some intrinsic, built-in knowledge-base of basic concepts! • All the other concepts can then be described in terms of these basic concepts. • Solution: Conceptual indexers are equipped with a knowledge base of basic terms.

  16. Patent Model – Conceptual Index • A patent’s Claims section is scanned and processedby a conceptual indexer. • The result is a descriptive index,associated with the patent (it size is approx. 1-5% of the full text). • This index can be seen as an ordered list of the patent’s WHAT associations(terms, phrases, sentence fragments). • An entry in the descriptive index contains a low-level concept,and the number of its occurrences.

  17. Patent Model – RDF/OWL • For a different application, a different RDF/OWL model needs to be devised. • For describing patents this model could be used to capture explicitly stated information: • Patent number and other numbers ( WHICH) • Inventor, examiner, attorney, … ( WHO) • Date when the patent was filed ( WHEN) • Explicit references to similar patents ( WHICH) • etc… • Each W can have multiple sub-categories that are application-specific!

  18. Patent Model – Creation Figure 2 – Creation of a patent model: Claims section is processed by the conceptual indexer to produce an index associated with the patent. Additional information about the concept is captured by RDF/OWL statements,into a predefined, application-specific structure.

  19. Patent Model – Result Figure 3 – Patent model: WHAT associations are contained in a descriptive index. Other Ws are expressed through RDF/OWL statements.

  20. Patent Model – Big Picture • Descriptive indices are re-processed by the Conceptual indexer,to form the system index. • Each entry in the system index retains links to the descriptive indices it originates from,and vice-versa. • This structure allows us to: • Perform quick searches of the existing patents • Add/remove patents easily

  21. Figure 4 – Top-level scheme

  22. Patent Model – Patent Relations Two ways of establishing relations among patents: • Via RDF/OWL statements, using automated reasoners •  Problem: Referential integrity & Consistency • Via System index (implicit links) •  Problem: Inexact, based on probability

  23. Patent Model – Implicit Links (1) • Descriptions of similar concepts (patents) usually make a frequent use of similar or even same terms. • By determining overlapping terms we createdynamic, implicit links among similar concepts. • The number of such implicit links can be used to express similarity among concepts. • The algorithm for determining the similarity needs to be tweaked empirically.

  24. Patent Model – Implicit Links (2) • For example:When describing two different vaccines we would probably make a frequent use of terms like: vaccine, inactivated antigens, immune response, etc.

  25. Advantages & Drawbacks • Advantages • Reduced complexity(a great reduction of direct links between concepts) • Fast search and retrieval(as the result of using indices) • Scalability • Drawbacks • Use of indices implies loss of precision

  26. Conclusion • Our idea is still in the first stage of development. • Its key advantages are:its general applicability and reduced complexity. • Further research is needed to explore the quality and feasibility of the proposed solution. • However, we expect that the combination of OWL/RDF structures and indices might produce a satisfactory performance/exactness ratio.

  27. References • W. A. Woods, L. A. Bookman, A. Houston, R. J. Kuhns, P. Martin, S. Green, "Linguistic Knowledge Can Improve Information Retrieval",Proc. of the Applied Natural Language Processing Conference (ANLP-2000),Seattle, 2000. • O. Scekic, P. Bojic, "An Overview of OWL and its Role in Semantic Web Architecture",YU-INFO 06, Kopaonik, Serbia&Montenegro, 2006. • Boris V. Dobrov, Natalia V. Loukachevitch, Tatyana N. Yudina, "Conceptual Indexing Using Thematic Representation of Texts“,Scientific Research Computer Center of Moscow State University, Moscow, 1998. • S. Omerovic, D. Savic, S. Tomazic,"A Survey of Concept Modeling",Faculty of Electrical Engineering, University of Ljubljana, Slovenia (to appear). • William A. Woods, “Conceptual Indexing:A Better Way to Organize Knowledge“,Technical report, Sun Microsystems Laboratories, 1998. • http://www.uspto.gov – U.S. Patent office

  28. CONCEPT MODELING: Revisited with Details A Proposed Hybrid Approach to Patent Modeling Đorđe PopovićOgnjen ŠćekićVeljko Milutinovićpopajce@ptt.yuogi@cg.yuvm@etf.bg.ac.yu

  29. Initial Assignment • January 2006Initial assignment:Get acquainted with different ways of Concept Modeling,in general. • More specifically, explore the possibilities offered by RDF and OWL. • One of the ideas: Use the 7 Ws - WHAT, WHO, WHEN, WHERE, WHY, WHICH, (W)HOW.

  30. What is Concept Modeling? • A way of modeling reality: • Identifying concepts • Identifying relations among concepts • Organizing the concepts in a knowledge-base, allowing an "intelligent" way to search and process this data. • Why do we need concept modeling?To make electronic resources not only machine-processable, but also machine-understandable!

  31. Challenges • How to create a model that has a uniform structure, and is powerful enough to capture the essence of any concept? • How should these models be linked into an efficient structure? • How can we bridge the gap between natural languageand a machine-processable model?

  32. Why Start with Patents? • Described by a very formal, structured language – claims. • Each patent is a novel concept. • Definition of one patent is usually based on another one.

  33. Structure of a Patent Document General info about the patent Description References to related patents Claims – primary target for What Abstract of the patent

  34. Conceptual Indexing • What is conceptual indexing? “New technique for organizing information to support subsequent access that can dramatically improve your ability to find the information you need,with less hassle and with better results.” William A. Woods • Conceptual indexing combines techniques of: • Knowledge representation • Natural language processing • Classical techniques for indexing words and phrases • Bridges the gap between natural languageand a machine processable model.

  35. Conceptual Indexing Conceptual indexing technology is a combination of: • Concept extractor Identifies phrases to be indexed. • Concept assimilator Analyzes a concept phrase to determine its place in the conceptual taxonomy. • Conceptual retrieval system Uses conceptual taxonomy to make connections between requested and indexed items. Figure 1 – Main components of a conceptual indexer

  36. Hybrid Approach: Indices + RDF/OWL • Conceptual indices • RDF/OWL • Motivation:Use the advantages of one approach to eliminate the drawbacks of the other.

  37. Conceptual Indices vs. RDF/OWL

  38. Why not Use Ontologies Alone? • If we want to use an ontology we have 2 choices: • Use an existing, well-established ontology that might not suite our needs. • Create a new ontology which does suit our needs: • We can create several different ontologies,depending on how we want to capture the information. • Problems arise when we want to merge ontologies. • This approach works fine within a closed communitywith specific needs: • There already exists a well-defined basic ontology structure. • Community members have a good knowledge of how to model new conceptsin terms of the existing ones.

  39. Why not Use Indices Alone? • For example, let us take the simplest possible definition, for a bird: bird 1 – a creature with wings and feathers that lays eggs and can usually fly. • Our index might then contain the following associations:creature, wings, feathers, eggs, fly. • A conceptual index does not offer the possibility to state the fact that some birds do not fly! 1 - Word definition taken from Longman Dictionary of Contemporary English, 3rd edition, 1995.

  40. Hybrid Approach (1) • An index of associations represents a simple model,similar to what humans have on their mindwhen they first think of a bird. • Having enough associations, one can create a model with a considerable degree of accuracy. • RDF/OWL statements provide a means for expressing additional (but very important) information(e.g. there are birds that cannot fly!) • We believe this is good enough for most applications.

  41. Hybrid Approach (2) • It is important to keep track of how many times a term is mentioned,because it affects its descriptive power. • Example: U.S. Patent 6,989,179 – “Synthetic grass sport surfaces”, claims section: 1. synthetic grass [10] 2. playing surface [9] … • These terms represent the essence of what is being described!

  42. Hybrid Approach (3) • However, this is only because we know what “synthetic grass” and “playing surface” are!  At some level, we need to have some intrinsic, built-in knowledge-base of basic concepts! • All the other concepts can then be described in terms of these basic concepts. • Solution: Conceptual indexers are equipped with a knowledge base of basic terms.

  43. Patent Model – Conceptual Index • A patent’s Claims section is scanned and processedby a conceptual indexer. • The result is a descriptive index,associated with the patent (it size is approx. 1-5% of the full text). • This index can be seen as an ordered list of the patent’s WHAT associations(terms, phrases, sentence fragments). • An entry in the descriptive index contains a low-level concept,and the number of its occurrences.

  44. Patent Model – RDF/OWL • For a different application, a different RDF/OWL model needs to be devised. • For describing patents this model could be used to capture explicitly stated information: • Patent number and other numbers ( WHICH) • Inventor, examiner, attorney, … ( WHO) • Date when the patent was filed ( WHEN) • Explicit references to similar patents ( WHICH) • etc… • Each W can have multiple sub-categories that are application-specific!

  45. Patent Model – Creation Figure 2 – Creation of a patent model: Claims section is processed by the conceptual indexer to produce an index associated with the patent. Additional information about the concept is captured by RDF/OWL statements,into a predefined, application-specific structure.

  46. Patent Model – Result Figure 3 – Patent model: WHAT associations are contained in a descriptive index. Other Ws are expressed through RDF/OWL statements.

  47. Patent Model – Big Picture • Descriptive indices are re-processed by the Conceptual indexer,to form the system index. • Each entry in the system index retains links to the descriptive indices it originates from,and vice-versa. • This structure allows us to: • Perform quick searches of the existing patents • Add/remove patents easily

  48. Figure 4 – Top-level scheme

  49. Patent Model – Implicit Links • Descriptions of similar concepts (patents) usually make a frequent use of similar or even same terms. • By determining overlapping terms we createdynamic, implicit links among similar concepts. • The number of such implicit links can be used to express similarity among concepts. • The algorithm for determining the similarity needs to be tweaked empirically.

  50. Advantages & Drawbacks • Advantages • Reduced complexity(a great reduction of direct links between concepts) • Fast search and retrieval(as the result of using indices) • Scalability • Drawbacks • Use of indices implies loss of precision

More Related