1.18k likes | 1.31k Views
CoBase: Scalable and Extensible Cooperative Information System. Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu. Conventional Query Answering. Need to know the detailed database schema Cannot get approximate answers
E N D
CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu
Conventional Query Answering • Need to know the detailed database schema • Cannot get approximate answers • Cannot answer conceptual queries • Cooperative Query Answering • Derive approximate Answers • Answer Conceptual Queries
CoBase Servers Heterogeneous Information Sources Find a nearby friendly airport that can land F-15 Find hospitals with facility similar to St. John’s near LAX CoBase provides: Relaxation Approximation Association Explanation Domain Knowledge Find a seaport with railway facility in Los Angeles CooperativeQueries
More Conceptual Query Specialization Generalization Conceptual Query Conceptual Query Specialization Generalization Specific Query Specific Query Generalization and Specialization
Type Abstraction Hierarchy (TAH) Provide multi-level knowledge representations Chemical-Suit Size TAH (A non-numerical TAH) All_Sizes Small_Size Large_Size Very_Small Large_to_Extra_Large Small_to_Medium Very_Large XXXS XXS S M L XL XXL
CA S. CA C. CA N. CA San Diego LA Long Beach Davis SF San Jose Palo Alto Sacramento Type Abstraction Hierarchy (TAH) (Location Example)
Relaxation Agent • query conditions • constraints Use knowledge-based approach (generalization and specialization via Type Abstraction Hierarchy) to relax the followings for matching:
Query Display Relax Attribute Answers Yes Database No Query Modification TAHs Query Relaxation
Visualization of Relaxation Process Query: Find seaports in the given region. relaxed region given region
not-relaxable runway-length relaxation-order (runway length, location) preference-list unacceptable-list answer-size relaxation-level Relaxation Control Primitives
^ (approximate) ^ 9 am between near-to (context-sensitive) Airport near-to LAX Restaurant near-to UCLA similar-to Airport similar-to LAX base-on (traffic,runway) within Relaxation Primitives
Similar-to Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width. select aport_name, runway_length, runway_width from runways, countries where aport_namesimilar-to‘Bizerte’ based-on((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd
Similar-to Result Similar-to module ranks the returned answers according to mean-squared error.
Avoid Northern Tunisia! Unacceptable List Operator Constraint CoBase Relaxation Manager Tunisia Tunisia Central Tunisia SW Tunisia Central Tunisia NE Tunisia SW Tunisia NW Tunisia ... Gafsa El Borma Bizerte El Borma Gafsa Trimmed TAH Type Abstraction Hierarchy
TAH Generation for Numerical Attribute Values • Relaxation Error • Difference between the exact value and the returned approximate value • The expected error is weighted by the probability of occurrence of each value • DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data
TAH Generation for Non-numerical Attribute Values Pattern Based Knowledge Induction (PBKI) • Rule-based approach • Clusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships) • Provides attribute correlation value (measure how well the rules applied to the databases)
Location Name Runway Length Tunisia All Central Tunisia Long Medium Short NE Tunisia SW Tunisia ... El Borma Bizerte Djedeida 0 ... 700 700 ... 1K 1K ... 5K Tunis Type Abstraction Hierarchy (TAH) Provide multi-level knowledge representations
Query Answers User Type = Planner User Type = Pilot Associated Attributes and Answers Associated Attributes and Answers Associative Query Answering Provide relevant information not explicitly asked by the user User Query: List all airports with runway length between 8500 and approximately 10000 feet
CoBase and GLADIntegration Wesley W. Chu
CoBase Functionality • Provide approximate matching • Find HETs with capacity of approximate 5-ton • Provide conceptual query answering • Find “Earth Moving” Equipment • Provide content-sensitive spatial queries • Find storage sites near selected location • (Integration with MATT map server) • Provide relaxation control • Relaxation order • Not-relaxable • At-least (answer set, quantity on hand)
Cooperative Operations Added to GLAD • Implicit Query Relaxation • Explicit Query Relaxation • Approximate operator • Similar-to/based-on • Spatial relaxation • Relaxation Control • Relaxation-order • Not-relaxable • At-least (answer-set size, quantity on hand)
CoBase Features Added to GLAD • Enhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.) • Display the query relaxation process • modified query conditions (value, spatial) • type abstraction hierarchies • Rank returned answers with similarity measures e.g., spatial relaxation ranks answers according to their distance from the selected location
Knowledge Base CoBase and GLAD TIE Report Collection Spatial Area Selection Filter Editor Display Generator Query Collection NSNs Object Cache Report Query Constructor CoBase Query Editor CoBase Relaxation Manager GLAD Data Cache CoBase Data Source Manager Databases
GLAD Query Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2)
CoGLAD Query with Relaxation Control Operators Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. Attribute passenger capacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_ston from nsn_description where (upper(class) = '7' and upper(cbs_category_nomen) = 'AIRCRAFT' and price < 700000 and pax_capacity_qty > 10 and upper (combat_type) = 'I' and capacity_wt_ston <= 2) not-relaxable pax_capacity_qty relaxation-order price capacity_wt_ston
CoGLAD Querywith Similar-to Operator Find aircraft similar toNSN = '0000IB0000961' based onthe attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1. select nsn from nsn_description where upper(nsn) similar-to '0000IB0000961' based-on((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0)) at-least 4 * '0000IB0000961' is an answer from the previous query
CoGLAD Querywith Approximate Operator Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150. select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and on_hand_quantity = ~150
Adding Constraints to a Query GLAD query select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ Query with added constraints select nsn, ric from dla_stock_report where nsn like ‘%8340%’ and nomenclature like ‘%TARP%’ and on_hand_quantity = ~150 and size_in_square_feet = 350
NSNs selected an area on the map constraint: quantity on hand Yes return the answers Query Processing satisfy constraints No CoBase Relaxation Manager relax the selected area based on the context-sensitive TAHs Example of Spatial Relaxation
Spatial Relaxation with Relaxation Control • relaxation-order: size, (latitude, longitude) • not-relaxable: price • at-least: • value: size of the tarpaulin • quantity on hand: relax until enough quantity on hand (specified by the user) is obtained
Mediator B Mediator A Module B Module A CoBase Ontology CoBase Ontology CoBase Content Language CoBase Content Language KQML KQML Mediator Inter-Communications via KQML CoBase Ontology Module Objects APIs Content Language Data Actions
Query Answers Without CoBase Query: find chemical suits
Electronic Warfare • Identify and locate sources of radiated electromagnetic energy • Determine emitter type based on the operating parameters of observed signals: • Radio Frequency (RF) • Pulse Repetition Frequency (PRF) • Pulse Duration (PD) • Scan Period (SP) • other operating parameters • Determine platform sites near the line of the bearing of an emitter This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ
Performance Improvement by Using CoBase in EW Conventional DB: parameter ranges from emitter specifications CoBase: DB: peak parameters (RF,PRF) and parameter ranges (PD,SP) KB: TAHs based on RF and PRF peak parameters TAHs based on PD and SP parameter ranges Case 1: emitter signals without noise Case 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%) Sample Size: 1000 signals Emitter Types: 75 This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ Frew, et al.), Camden, NJ
Conclusions • Provide user and context sensitive query relaxations (structured and unstructured data) • Provide additional information (associative query answering) based on past cases • CoSQL (Cooperative SQL) • similar-to, near-to, approximate • relaxation control operators • GUI • map server, high-level query formation
CoSent: An Active Data Base Technology • Natural language-like rule supports conceptual & approximate terms • Decompose natural language-like rule to low level rules via knowledge based (TAH) • Mimic human cognitive process and thus ease in rule specification • Ease in rule maintenance
CoSent: An Active Database Technologies • Trigger with high-level rules containing • conceptual term (e.g.,bad, heavy) and • approximate operators (e.g.,similar-to,near-to,approximate) • Allow trigger conditions to be specified with fuzzy and conceptual terms • Mimic human cognitive expression CoSent monitors temporal composition events and executes rules with conceptual and approximate terms.
Key Features of CoSent • User defined rules transformed into low-level range values via knowledge base--Type Abstraction Hierarchies (TAHs) • TAHs are typically generated from data sources automatically • Leveraged on conventional DBMS (e.g., Oracle, Sybase, Teradata) triggering systems • Rule definition is either specified by domain expert or derived by data mining technologies