1 / 42

Relaxation of XML Documents

Relaxation of XML Documents. Wesley W. Chu Computer Science Department UCLA wwc@ucla.edu September 19, 2000. Outline. Goals Transformation of XML to Relational Tables Relaxation Example queries Future directions. Goals.

conroy
Download Presentation

Relaxation of XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relaxation of XML Documents Wesley W. Chu Computer Science Department UCLA wwc@ucla.edu September 19, 2000

  2. Outline • Goals • Transformation of XML to Relational Tables • Relaxation • Example queries • Future directions

  3. Goals • Given an XML file as sample data, map it into relational database • Generate TAHs on attributes from relational tables • Execute queries and perform relaxation on sample data

  4. Architecture extract DTD from XML file 1 XMLdoc XMLdoc DTD XMLdoc XML Spy XPRESS data map schema map 3 2 Relaxed Answers SQL RDB query processingquery relaxation 5 4 generate TAHs TAH

  5. Extracting DTD From XML • Translation of XML to relational tables requires a DTD that provides schema for XML file • DTD for XML file can be extracted from the file using XML Spy tool

  6. Ato.xml (provided by Stuart Hirshfield of AFRL Stuart.Hirshfield@rl.af.mil) <air_to_cofn> <operation_identification_data setid="OPER"> <operation_codeword> DESERT STORM (DECLASSIFIED) </operation_codeword> </operation_identification_data> <message_identification setid="MSGID"> <message_text_format_identifier> ATOCONF </message_text_format_identifier> <originator> USCENTAF_CMBT PLANS </originator> </message_identification> <effective_day_time_period setid="PERID"> … </effective_day_time_period>

  7. Ato.xml (Cont) <air_tasking_group> <air_tasking setid="AIRTASK"> UNIT TASKING </air_tasking> <task_unit_loc_gp> <task_unit_and_location setid="TASKUNIT"> <tasked_unit_designator> C41 </tasked_unit_designator> </task_unit_and_location> <air_mission_data_group> <air_mission_data setid="MSNDAT"> <mission_number> 7353 </mission_number> <package_identification> ZZN …

  8. Ato.xml (Cont) <air_mission_data_subgroup> <iff_sif_mode_and_code> 32053 </iff_sif_mode_and_code> </air_mission_data_subgroup> </air_mission_data> <mission_location setid="MSNLOC"> <mission_start_day_time> 240100Z </mission_start_day_time> <mission_location_subgroup> <point_name> DAGGER </point_name> </mission_location_subgroup> </mission_location> <amplification setid="AMPN"> <free_text> REMARK IDENTIFIER(S): G …

  9. Ato.dtd (Generated From XML File) <!ELEMENT air_to_cofn (operation_identification_data, message_identification, effective_day_time_period, air_tasking_group)> <!ELEMENT operation_identification_data (operation_codeword)> <!ELEMENT operation_codeword (#PCDATA)> <!ELEMENT message_identification (message_text_format_identifier, originator)> <!ATTLIST message_identification setid CDATA #REQUIRED> <!ELEMENT originator (#PCDATA)> <!ELEMENT message_text_format_identifier (#PCDATA)> <!ELEMENT air_tasking_group (air_tasking, task_unit_loc_gp+)> <!ELEMENT air_tasking (#PCDATA)> <!ELEMENT task_unit_loc_gp (task_unit_and_location, air_mission_data_group+)>

  10. Ato.dtd (Cont) <!ELEMENT air_mission_data_group (air_mission_data+ | mission_location+ | target_location+ | recon_mis_info_gp+ | refueling_information+ | air_ref_info+ | amplification+ | narrative_information+ | remarks+)+> <!ELEMENT air_mission_data (mission_number, package_identification, aircraft_call_sign, num_type_aircraft_model, mission_type, alert_status, primary_configuration_code, secondary_configuration_code, air_mission_data_subgroup+)> <!ELEMENT mission_number (#PCDATA)> … <!ELEMENT mission_location (mission_start_day_time, mission_stop_day_time, mission_location_name, altitude_in_hundreds_of_feet?, air_support_request_number?, mission_location_subgroup?)>

  11. Ato.dtd (Cont) <!ELEMENT amplification (free_text)> <!ATTLIST amplification setid CDATA #REQUIRED> <!ELEMENT target_location (on_target_day_time, off_target_day_time, target_identifier_be_number?, mission_location_name?, target_type, dmpi_lat_long_seconds?, dmpi_lat_long_minutes?, air_support_request_number?, dmpi_lat_long_deciseconds?, target_comments?, air_support_request_number?)> <!ELEMENT recon_mis_info (request_number, mission_priority, on_target_day_time, latest_date_time, recon_mis_type, type_recon_sur_coverage, imagery_type, image_qualifier, coverage_extent_and_mode, recon_target_code, scale_of_prints, recon_mis_info_subgroup)> …

  12. XPRESS • XPRESS (Xml Processing and Relaxation in rElational Storage System) project @ UCLA/CSD • Transfers XML file with DTD to relational tables • Efficiently: handles a large number of XML documents • Accurately: retains structure & constraint • Automatically: does not require human intervention • The sample Ato XML file is 366KB, translating into 15 tables and ~900 tuples. Generated DTD is 6KB • Xpress has been tested with over 60MB of XML data, translating into over half a million tuples

  13. Steps • Schema mapping: CREATE TABLE statements are generated by XPRESS from the DTD • An intermediate Map object is created to provide XPRESS data mapping information • Data mapping: Using the Map object, XPRESS transfers the XML file into the database

  14. Generated Tables for Ato Data • air_to_cofn (1) • task_unit_loc_gp (24) • air_mission_data_group (129) • target_location • narrative_information • amplification • remarks • air_mission_data • air_mission_data_subgroup • recon_mis_info_gp • location_trace_plot • refueling_information • air_ref_info • air_ref_info_subgroup • mission_location

  15. Query Relaxation for XML File • When no answer or not enough answers are found, derive approximate answers by relaxing the query condition • TAH: knowledge structure to aid relaxation • TAHs are generated from translated relational tables

  16. CoBase Architecture

  17. Example Query 1 • Find the id of any task unit location group that has an air refueling subgroup with 25,000 pounds of offload fuel. Also list the amount offload fuel • SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB FROM AIR_REF_INFO_SUBGROUP, AIR_REF_INFO, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP WHERE FK_AIR_REF_INFO = AIR_REF_INFO.ID AND FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = 25 • Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB -- -------------------------- 2524116 25

  18. Relax Query 1 • Relax previous query to an offload fuel of approximately 25,000 lbs, and at least 5 answers • SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB … ANDAIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = ~25 AT-LEAST5 • Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB --------------- -------------------------- 2524048 10 2524048 20 2524048 26 2524116 14 2524116 20 2524116 24 2524116 25 --------------- -------------------------- # of answer tuples: 7

  19. TAH total_offload_fuel_thou_lb

  20. Example Query 2 • Find the call sign of all aircraft that has a 10 minute alert status. Also list its tasked unit designator and alert status • SELECTAIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS FROMAIR_MISSION_DATA, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP WHEREFK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID ANDAIR_MISSION_DATA.ALERT_STATUS = 10 • Answer CALL_SIGN DESIGNATOR ALERT_STATUS --------- ---------- ------------ VOLVO 51 4TFW 10

  21. Relax Query 2 • Relax previous query to an alert status of approximately 10 minutes, and at least 3 answers • SELECTAIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS … ANDAIR_MISSION_DATA.ALERT_STATUS = ~10 AT-LEAST 3 • Answer CALL_SIGN DESIGNATOR ALERT_STATUS --------------- --------------- --------------- VOLVO 51 4TFW 10 ARCO 05 1TFW 5 ARCO 03 1TFW 5 STANDARD 63 33TFW 5 STANDARD 61 33TFW 5 ARCO 01 1TFW 5 --------------- --------------- --------------- # of answer tuples: 6

  22. TAH alert_status

  23. Example Query 3 • Find the mission location at an altitude of 22,000 ft. Also list any remarks on amplification • SELECTMISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT FROMMISSION_LOCATION, AMPLIFICATION WHEREMISSION_LOCATION.FK_AIR_MISSION_DATA_GROUP = AMPLIFICATION.FK_AIR_MISSION_DATA_GROUP AND ALTITUDE_IN_HUNDREDS_OF_FEET = 220 • Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT ---------------- ------------------ ---------- no rows selected

  24. Relax Query 3 • Relax previous query to an altitude of approximately 22,000 ft, and at least 7 answers • SELECTMISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT … ANDALTITUDE_IN_HUNDREDS_OF_FEET = ~220 AT-LEAST 7 • Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT ---------------- ------------------ ---------- ABCCC 4 190 IDENTIFIER(S): E F H ABCCC 5 200 IDENTIFIER(S): A C F G ABCCC 5 200 IDENTIFIER(S): C F G CADDY HIGH 210 IDENTIFIER(S): A E RAISIN PRE 190 IDENTIFIER(S): A W F RAISIN PRE 200 IDENTIFIER(S): A W F RAISIN PRE 210 IDENTIFIER(S): A W F

  25. TAH altitude

  26. Future Directions • More XML test data • Current query relaxation is performed by CoBase with CoSQL • Future research direction • Investigating query relaxation via CoXQL • Generating non-numerical TAHs from semi-structured data

  27. Appendix – Generated Tables

  28. air_to_cofn • CREATE TABLE air_to_cofn ( originator VARCHAR(100), to_day_time VARCHAR(100), id NUMBER, operation_codeword VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), from_day_time VARCHAR(100), air_tasking VARCHAR(100), message_text_format_identifier VARCHAR(100), setid VARCHAR(100), PRIMARY KEY (id) );

  29. task_unit_loc_gp • CREATE TABLE task_unit_loc_gp ( fk_air_to_cofn NUMBER, tasked_unit_designator VARCHAR(100), id NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), FOREIGN KEY (fk_air_to_cofn) REFERENCES air_to_cofn(id), PRIMARY KEY (id) );

  30. air_mission_data_group • CREATE TABLE air_mission_data_group ( fk_task_unit_loc_gp NUMBER, id NUMBER, root_elm VARCHAR(100), FOREIGN KEY (fk_task_unit_loc_gp) REFERENCES task_unit_loc_gp(id), PRIMARY KEY (id) );

  31. target_location • CREATE TABLE target_location ( on_target_day_time VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, dmpi_lat_long_minutes VARCHAR(100), target_type VARCHAR(100), target_identifier_be_nmber VARCHAR(100), root_elm VARCHAR(100), off_target_day_time VARCHAR(100), mission_location_name VARCHAR(100), target_comments VARCHAR(100), dmpi_lat_long_seconds VARCHAR(100), dmpi_lat_long_deciseconds VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

  32. narrative_information • CREATE TABLE narrative_information ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );

  33. amplification • CREATE TABLE amplification ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100), root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );

  34. remarks • CREATE TABLE remarks ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );

  35. air_mission_data • CREATE TABLE air_mission_data ( mission_number VARCHAR(100), primary_configuration_code VARCHAR(100), id NUMBER, fk_air_mission_data_group NUMBER, package_identification VARCHAR(100), num_type_aircraft_model VARCHAR(100), mission_type VARCHAR(100), root_elm VARCHAR(100), alert_status NUMBER, aircraft_call_sign VARCHAR(100), secondary_configuration_code VARCHAR(100), setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

  36. recon_mis_info_gp • CREATE TABLE recon_mis_info_gp ( coverage_extent_and_mode VARCHAR(100), on_target_day_time VARCHAR(100), id NUMBER, imagery_type VARCHAR(100), fk_air_mission_data_group NUMBER, flddesc VARCHAR(100), delivery_address VARCHAR(100), type_recon_sur_coverage VARCHAR(100), recon_target_code VARCHAR(100), scale_of_prints NUMBER, root_elm VARCHAR(100), image_qualifier VARCHAR(100), latest_date_time VARCHAR(100), mission_priority NUMBER, request_number VARCHAR(100), recon_mis_type VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

  37. location_trace_plot • CREATE TABLE location_trace_plot ( location_lat_long_seconds VARCHAR(100), fk_recon_mis_info_gp NUMBER, setid VARCHAR(100), root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_recon_mis_info_gp) REFERENCES recon_mis_info_gp(id), PRIMARY KEY (id) );

  38. refueling_information • CREATE TABLE refueling_information ( id NUMBER, fk_air_mission_data_group NUMBER, total_offload_fuel_thou_lb NUMBER, primary_frequency_designator VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), air_ref_cont_pn VARCHAR(100), air_refueling_control_day_time VARCHAR(100), tanker_mission_number VARCHAR(100), primary_frequency_in_megahertz NUMBER, tanker_call_sign VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );

  39. air_mission_data_subgroup • CREATE TABLE air_mission_data_subgroup ( iff_sif_mode_and_code NUMBER, fk_air_mission_data NUMBER, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data) REFERENCES air_mission_data(id), PRIMARY KEY (id) );

  40. air_ref_info • CREATE TABLE air_ref_info ( fk_air_mission_data_group NUMBER, id NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );

  41. air_ref_info_subgroup • CREATE TABLE air_ref_info_subgroup ( mission_number VARCHAR(100), id NUMBER, tanker_assignment VARCHAR(100), refueling_fuel_type VARCHAR(100), total_offload_fuel_thou_lb NUMBER, num_type_aircraft_model VARCHAR(100), fk_air_ref_info NUMBER, root_elm VARCHAR(100), air_refueling_control_day_time VARCHAR(100), aircraft_call_sign VARCHAR(100), FOREIGN KEY (fk_air_ref_info) REFERENCES air_ref_info(id), PRIMARY KEY (id) );

  42. mission_location • CREATE TABLE mission_location ( mission_stop_day_time VARCHAR(100), area_coor_lat_long_min VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, point_name VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), mission_location_name VARCHAR(100), mission_start_day_time VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));

More Related