1 / 58

CoBase: Scalable and Extensible Cooperative Information System

CoBase: Scalable and Extensible Cooperative Information System. Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu. Conventional Query Answering. Need to know the detailed database schema Cannot get approximate answers

Download Presentation

CoBase: Scalable and Extensible Cooperative Information System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu

  2. Conventional Query Answering • Need to know the detailed database schema • Cannot get approximate answers • Cannot answer conceptual queries • Cooperative Query Answering • Derive approximate Answers • Answer Conceptual Queries

  3. CoBase Servers Heterogeneous Information Sources Find a nearby friendly airport that can land F-15 Find hospitals with facility similar to St. John’s near LAX CoBase provides: Relaxation Approximation Association Explanation Domain Knowledge Find a seaport with railway facility in Los Angeles CooperativeQueries

  4. More Conceptual Query Specialization Generalization Conceptual Query Conceptual Query Specialization Generalization Specific Query Specific Query Generalization and Specialization

  5. Type Abstraction Hierarchy (TAH) Provide multi-level knowledge representations Chemical-Suit Size TAH (A non-numerical TAH) All_Sizes Small_Size Large_Size Very_Small Large_to_Extra_Large Small_to_Medium Very_Large XXXS XXS S M L XL XXL

  6. CA S. CA C. CA N. CA San Diego LA Long Beach Davis SF San Jose Palo Alto Sacramento Type Abstraction Hierarchy (TAH) (Location Example)

  7. Relaxation Agent • query conditions • constraints Use knowledge-based approach (generalization and specialization via Type Abstraction Hierarchy) to relax the followings for matching:

  8. Query Display Relax Attribute Answers Yes Database No Query Modification TAHs Query Relaxation

  9. Visualization of Relaxation Process Query: Find seaports in the given region. relaxed region given region

  10. not-relaxable runway-length relaxation-order (runway length, location) preference-list unacceptable-list answer-size relaxation-level Relaxation Control Primitives

  11. ^ (approximate) ^ 9 am between near-to (context-sensitive) Airport near-to LAX Restaurant near-to UCLA similar-to Airport similar-to LAX base-on (traffic,runway) within Relaxation Primitives

  12. Similar-to Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width. select aport_name, runway_length, runway_width from runways, countries where aport_namesimilar-to‘Bizerte’ based-on((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd

  13. Similar-to Result Similar-to module ranks the returned answers according to mean-squared error.

  14. Avoid Northern Tunisia! Unacceptable List Operator Constraint CoBase Relaxation Manager Tunisia Tunisia Central Tunisia SW Tunisia Central Tunisia NE Tunisia SW Tunisia NW Tunisia ... Gafsa El Borma Bizerte El Borma Gafsa Trimmed TAH Type Abstraction Hierarchy

  15. TAH Generation for Numerical Attribute Values • Relaxation Error • Difference between the exact value and the returned approximate value • The expected error is weighted by the probability of occurrence of each value • DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

  16. TAH Generation for Non-numerical Attribute Values Pattern Based Knowledge Induction (PBKI) • Rule-based approach • Clusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships) • Provides attribute correlation value (measure how well the rules applied to the databases)

  17. Location Name Runway Length Tunisia All Central Tunisia Long Medium Short NE Tunisia SW Tunisia ... El Borma Bizerte Djedeida 0 ... 700 700 ... 1K 1K ... 5K Tunis Type Abstraction Hierarchy (TAH) Provide multi-level knowledge representations

  18. Query Answers User Type = Planner User Type = Pilot Associated Attributes and Answers Associated Attributes and Answers Associative Query Answering Provide relevant information not explicitly asked by the user User Query: List all airports with runway length between 8500 and approximately 10000 feet

  19. CoBase and GLADIntegration

  20. CoBase Functionality • Provide approximate matching • Find HETs with capacity of approximate 5-ton • Provide conceptual query answering • Find “Earth Moving” Equipment • Provide content-sensitive spatial queries • Find storage sites near selected location • (Integration with MATT map server) • Provide relaxation control • Relaxation order • Not-relaxable • At-least (answer set, quantity on hand)

  21. Cooperative Operations Added to GLAD • Implicit Query Relaxation • Explicit Query Relaxation • Approximate operator • Similar-to/based-on • Spatial relaxation • Relaxation Control • Relaxation-order • Not-relaxable • At-least (answer-set size, quantity on hand)

  22. CoBase Features Added to GLAD • Enhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.) • Display the query relaxation process • modified query conditions (value, spatial) • type abstraction hierarchies • Rank returned answers with similarity measures e.g., spatial relaxation ranks answers according to their distance from the selected location

  23. Knowledge Base CoBase and GLAD TIE Report Collection Spatial Area Selection Filter Editor Display Generator Query Collection NSNs Object Cache Report Query Constructor CoBase Query Editor CoBase Relaxation Manager GLAD Data Cache CoBase Data Source Manager Databases

  24. GLAD Query Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2)

  25. CoGLAD Query with Relaxation Control Operators Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. Attribute passenger capacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2) not-relaxable pax_capacity_qty relaxation-order price capacity_wt_ston

  26. CoGLAD Querywith Similar-to Operator Find aircraft similar toNSN = '0000IB0000961' based onthe attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1. select nsn from nsn_description where upper(nsn) similar-to '0000IB0000961' based-on((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0)) at-least 4 * '0000IB0000961' is an answer from the previous query

  27. CoGLAD Querywith Approximate Operator Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150. select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and on_hand_quantity = ~150

  28. Adding Constraints to a Query GLAD query select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ Query with added constraints select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ and on_hand_quantity = ~150 and size_in_square_feet = 350

  29. NSNs selected an area on the map constraint: quantity on hand Yes return the answers Query Processing satisfy constraints No CoBase Relaxation Manager relax the selected area based on the context-sensitive TAHs Example of Spatial Relaxation

  30. Spatial Relaxation with Relaxation Control • relaxation-order: size, (latitude, longitude) • not-relaxable: price • at-least: • value: size of the tarpaulin • quantity on hand: relax until enough quantity on hand (specified by the user) is obtained

  31. Scalable and Extensible CoBase Architecture

  32. Mediator B Mediator A Module B Module A CoBase Ontology CoBase Ontology CoBase Content Language CoBase Content Language KQML KQML Mediator Inter-Communications via KQML CoBase Ontology Module Objects APIs Content Language Data Actions

  33. Query Answers Without CoBase Query: find chemical suits

  34. Electronic Warfare • Identify and locate sources of radiated electromagnetic energy • Determine emitter type based on the operating parameters of observed signals: • Radio Frequency (RF) • Pulse Repetition Frequency (PRF) • Pulse Duration (PD) • Scan Period (SP) • other operating parameters • Determine platform sites near the line of the bearing of an emitter This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ

  35. Performance Improvement by Using CoBase in EW Conventional DB: parameter ranges from emitter specifications CoBase: DB: peak parameters (RF,PRF) and parameter ranges (PD,SP) KB: TAHs based on RF and PRF peak parameters TAHs based on PD and SP parameter ranges Case 1: emitter signals without noise Case 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%) Sample Size: 1000 signals Emitter Types: 75 This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ

  36. Current CoBase Users and Applications

  37. XML Query Relaxation

  38. XML Overview • XML (eXtensible Markup Language) is a format for specifying structured documents and data. • XML is extensible since it allows users to define their own schema (unlike HTML which is a pre-defined markup language).

  39. XML (cont.) • XML is a hierarchical data model. • A XML document consists of two parts • Schema • Data • The schema describes the structure of the data. • Example: <?xml version="1.0" encoding="ISO-8859-1"?> <!-- Edited with XML Spy v4.2 --> <!DOCTYPE note [ <!ELEMENT note (to, from, heading, body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Schema Data

  40. XML Query Languages • XML can be represented as an ordered tree with: • Nodes representing elements and attributes • Edges representing inclusion relationships • An XML query can similarly be represented as a tree with edges of two types: • “/” for parent-child relationships • “//” for ancestor-descendent relationships

More Related