420 likes | 590 Views
Relaxation of XML Documents. Wesley W. Chu Computer Science Department UCLA wwc@ucla.edu September 19, 2000. Outline. Goals Transformation of XML to Relational Tables Relaxation Example queries Future directions. Goals.
E N D
Relaxation of XML Documents Wesley W. Chu Computer Science Department UCLA wwc@ucla.edu September 19, 2000
Outline • Goals • Transformation of XML to Relational Tables • Relaxation • Example queries • Future directions
Goals • Given an XML file as sample data, map it into relational database • Generate TAHs on attributes from relational tables • Execute queries and perform relaxation on sample data
Architecture extract DTD from XML file 1 XMLdoc XMLdoc DTD XMLdoc XML Spy XPRESS data map schema map 3 2 Relaxed Answers SQL RDB query processingquery relaxation 5 4 generate TAHs TAH
Extracting DTD From XML • Translation of XML to relational tables requires a DTD that provides schema for XML file • DTD for XML file can be extracted from the file using XML Spy tool
Ato.xml (provided by Stuart Hirshfield of AFRL Stuart.Hirshfield@rl.af.mil) <air_to_cofn> <operation_identification_data setid="OPER"> <operation_codeword> DESERT STORM (DECLASSIFIED) </operation_codeword> </operation_identification_data> <message_identification setid="MSGID"> <message_text_format_identifier> ATOCONF </message_text_format_identifier> <originator> USCENTAF_CMBT PLANS </originator> </message_identification> <effective_day_time_period setid="PERID"> … </effective_day_time_period>
Ato.xml (Cont) <air_tasking_group> <air_tasking setid="AIRTASK"> UNIT TASKING </air_tasking> <task_unit_loc_gp> <task_unit_and_location setid="TASKUNIT"> <tasked_unit_designator> C41 </tasked_unit_designator> </task_unit_and_location> <air_mission_data_group> <air_mission_data setid="MSNDAT"> <mission_number> 7353 </mission_number> <package_identification> ZZN …
Ato.xml (Cont) <air_mission_data_subgroup> <iff_sif_mode_and_code> 32053 </iff_sif_mode_and_code> </air_mission_data_subgroup> </air_mission_data> <mission_location setid="MSNLOC"> <mission_start_day_time> 240100Z </mission_start_day_time> <mission_location_subgroup> <point_name> DAGGER </point_name> </mission_location_subgroup> </mission_location> <amplification setid="AMPN"> <free_text> REMARK IDENTIFIER(S): G …
Ato.dtd (Generated From XML File) <!ELEMENT air_to_cofn (operation_identification_data, message_identification, effective_day_time_period, air_tasking_group)> <!ELEMENT operation_identification_data (operation_codeword)> <!ELEMENT operation_codeword (#PCDATA)> <!ELEMENT message_identification (message_text_format_identifier, originator)> <!ATTLIST message_identification setid CDATA #REQUIRED> <!ELEMENT originator (#PCDATA)> <!ELEMENT message_text_format_identifier (#PCDATA)> <!ELEMENT air_tasking_group (air_tasking, task_unit_loc_gp+)> <!ELEMENT air_tasking (#PCDATA)> <!ELEMENT task_unit_loc_gp (task_unit_and_location, air_mission_data_group+)>
Ato.dtd (Cont) <!ELEMENT air_mission_data_group (air_mission_data+ | mission_location+ | target_location+ | recon_mis_info_gp+ | refueling_information+ | air_ref_info+ | amplification+ | narrative_information+ | remarks+)+> <!ELEMENT air_mission_data (mission_number, package_identification, aircraft_call_sign, num_type_aircraft_model, mission_type, alert_status, primary_configuration_code, secondary_configuration_code, air_mission_data_subgroup+)> <!ELEMENT mission_number (#PCDATA)> … <!ELEMENT mission_location (mission_start_day_time, mission_stop_day_time, mission_location_name, altitude_in_hundreds_of_feet?, air_support_request_number?, mission_location_subgroup?)>
Ato.dtd (Cont) <!ELEMENT amplification (free_text)> <!ATTLIST amplification setid CDATA #REQUIRED> <!ELEMENT target_location (on_target_day_time, off_target_day_time, target_identifier_be_number?, mission_location_name?, target_type, dmpi_lat_long_seconds?, dmpi_lat_long_minutes?, air_support_request_number?, dmpi_lat_long_deciseconds?, target_comments?, air_support_request_number?)> <!ELEMENT recon_mis_info (request_number, mission_priority, on_target_day_time, latest_date_time, recon_mis_type, type_recon_sur_coverage, imagery_type, image_qualifier, coverage_extent_and_mode, recon_target_code, scale_of_prints, recon_mis_info_subgroup)> …
XPRESS • XPRESS (Xml Processing and Relaxation in rElational Storage System) project @ UCLA/CSD • Transfers XML file with DTD to relational tables • Efficiently: handles a large number of XML documents • Accurately: retains structure & constraint • Automatically: does not require human intervention • The sample Ato XML file is 366KB, translating into 15 tables and ~900 tuples. Generated DTD is 6KB • Xpress has been tested with over 60MB of XML data, translating into over half a million tuples
Steps • Schema mapping: CREATE TABLE statements are generated by XPRESS from the DTD • An intermediate Map object is created to provide XPRESS data mapping information • Data mapping: Using the Map object, XPRESS transfers the XML file into the database
Generated Tables for Ato Data • air_to_cofn (1) • task_unit_loc_gp (24) • air_mission_data_group (129) • target_location • narrative_information • amplification • remarks • air_mission_data • air_mission_data_subgroup • recon_mis_info_gp • location_trace_plot • refueling_information • air_ref_info • air_ref_info_subgroup • mission_location
Query Relaxation for XML File • When no answer or not enough answers are found, derive approximate answers by relaxing the query condition • TAH: knowledge structure to aid relaxation • TAHs are generated from translated relational tables
Example Query 1 • Find the id of any task unit location group that has an air refueling subgroup with 25,000 pounds of offload fuel. Also list the amount offload fuel • SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB FROM AIR_REF_INFO_SUBGROUP, AIR_REF_INFO, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP WHERE FK_AIR_REF_INFO = AIR_REF_INFO.ID AND FK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID AND AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = 25 • Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB -- -------------------------- 2524116 25
Relax Query 1 • Relax previous query to an offload fuel of approximately 25,000 lbs, and at least 5 answers • SELECT DISTINCT TASK_UNIT_LOC_GP.ID, AIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB … ANDAIR_REF_INFO_SUBGROUP.TOTAL_OFFLOAD_FUEL_THOU_LB = ~25 AT-LEAST5 • Answer ID TOTAL_OFFLOAD_FUEL_THOU_LB --------------- -------------------------- 2524048 10 2524048 20 2524048 26 2524116 14 2524116 20 2524116 24 2524116 25 --------------- -------------------------- # of answer tuples: 7
Example Query 2 • Find the call sign of all aircraft that has a 10 minute alert status. Also list its tasked unit designator and alert status • SELECTAIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS FROMAIR_MISSION_DATA, AIR_MISSION_DATA_GROUP, TASK_UNIT_LOC_GP WHEREFK_AIR_MISSION_DATA_GROUP = AIR_MISSION_DATA_GROUP.ID AND FK_TASK_UNIT_LOC_GP = TASK_UNIT_LOC_GP.ID ANDAIR_MISSION_DATA.ALERT_STATUS = 10 • Answer CALL_SIGN DESIGNATOR ALERT_STATUS --------- ---------- ------------ VOLVO 51 4TFW 10
Relax Query 2 • Relax previous query to an alert status of approximately 10 minutes, and at least 3 answers • SELECTAIRCRAFT_CALL_SIGN, TASKED_UNIT_DESIGNATOR, ALERT_STATUS … ANDAIR_MISSION_DATA.ALERT_STATUS = ~10 AT-LEAST 3 • Answer CALL_SIGN DESIGNATOR ALERT_STATUS --------------- --------------- --------------- VOLVO 51 4TFW 10 ARCO 05 1TFW 5 ARCO 03 1TFW 5 STANDARD 63 33TFW 5 STANDARD 61 33TFW 5 ARCO 01 1TFW 5 --------------- --------------- --------------- # of answer tuples: 6
Example Query 3 • Find the mission location at an altitude of 22,000 ft. Also list any remarks on amplification • SELECTMISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT FROMMISSION_LOCATION, AMPLIFICATION WHEREMISSION_LOCATION.FK_AIR_MISSION_DATA_GROUP = AMPLIFICATION.FK_AIR_MISSION_DATA_GROUP AND ALTITUDE_IN_HUNDREDS_OF_FEET = 220 • Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT ---------------- ------------------ ---------- no rows selected
Relax Query 3 • Relax previous query to an altitude of approximately 22,000 ft, and at least 7 answers • SELECTMISSION_LOCATION_NAME, ALTITUDE_IN_HUNDREDS_OF_FEET, FREE_TEXT … ANDALTITUDE_IN_HUNDREDS_OF_FEET = ~220 AT-LEAST 7 • Answer MISSION_LOC_NAME ALTITUDE (HUND FT) FREE_TEXT ---------------- ------------------ ---------- ABCCC 4 190 IDENTIFIER(S): E F H ABCCC 5 200 IDENTIFIER(S): A C F G ABCCC 5 200 IDENTIFIER(S): C F G CADDY HIGH 210 IDENTIFIER(S): A E RAISIN PRE 190 IDENTIFIER(S): A W F RAISIN PRE 200 IDENTIFIER(S): A W F RAISIN PRE 210 IDENTIFIER(S): A W F
Future Directions • More XML test data • Current query relaxation is performed by CoBase with CoSQL • Future research direction • Investigating query relaxation via CoXQL • Generating non-numerical TAHs from semi-structured data
air_to_cofn • CREATE TABLE air_to_cofn ( originator VARCHAR(100), to_day_time VARCHAR(100), id NUMBER, operation_codeword VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), from_day_time VARCHAR(100), air_tasking VARCHAR(100), message_text_format_identifier VARCHAR(100), setid VARCHAR(100), PRIMARY KEY (id) );
task_unit_loc_gp • CREATE TABLE task_unit_loc_gp ( fk_air_to_cofn NUMBER, tasked_unit_designator VARCHAR(100), id NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), FOREIGN KEY (fk_air_to_cofn) REFERENCES air_to_cofn(id), PRIMARY KEY (id) );
air_mission_data_group • CREATE TABLE air_mission_data_group ( fk_task_unit_loc_gp NUMBER, id NUMBER, root_elm VARCHAR(100), FOREIGN KEY (fk_task_unit_loc_gp) REFERENCES task_unit_loc_gp(id), PRIMARY KEY (id) );
target_location • CREATE TABLE target_location ( on_target_day_time VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, dmpi_lat_long_minutes VARCHAR(100), target_type VARCHAR(100), target_identifier_be_nmber VARCHAR(100), root_elm VARCHAR(100), off_target_day_time VARCHAR(100), mission_location_name VARCHAR(100), target_comments VARCHAR(100), dmpi_lat_long_seconds VARCHAR(100), dmpi_lat_long_deciseconds VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
narrative_information • CREATE TABLE narrative_information ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );
amplification • CREATE TABLE amplification ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100), root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );
remarks • CREATE TABLE remarks ( free_text VARCHAR(100), fk_air_mission_data_group NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );
air_mission_data • CREATE TABLE air_mission_data ( mission_number VARCHAR(100), primary_configuration_code VARCHAR(100), id NUMBER, fk_air_mission_data_group NUMBER, package_identification VARCHAR(100), num_type_aircraft_model VARCHAR(100), mission_type VARCHAR(100), root_elm VARCHAR(100), alert_status NUMBER, aircraft_call_sign VARCHAR(100), secondary_configuration_code VARCHAR(100), setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
recon_mis_info_gp • CREATE TABLE recon_mis_info_gp ( coverage_extent_and_mode VARCHAR(100), on_target_day_time VARCHAR(100), id NUMBER, imagery_type VARCHAR(100), fk_air_mission_data_group NUMBER, flddesc VARCHAR(100), delivery_address VARCHAR(100), type_recon_sur_coverage VARCHAR(100), recon_target_code VARCHAR(100), scale_of_prints NUMBER, root_elm VARCHAR(100), image_qualifier VARCHAR(100), latest_date_time VARCHAR(100), mission_priority NUMBER, request_number VARCHAR(100), recon_mis_type VARCHAR(100), setid VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));
location_trace_plot • CREATE TABLE location_trace_plot ( location_lat_long_seconds VARCHAR(100), fk_recon_mis_info_gp NUMBER, setid VARCHAR(100), root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_recon_mis_info_gp) REFERENCES recon_mis_info_gp(id), PRIMARY KEY (id) );
refueling_information • CREATE TABLE refueling_information ( id NUMBER, fk_air_mission_data_group NUMBER, total_offload_fuel_thou_lb NUMBER, primary_frequency_designator VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), air_ref_cont_pn VARCHAR(100), air_refueling_control_day_time VARCHAR(100), tanker_mission_number VARCHAR(100), primary_frequency_in_megahertz NUMBER, tanker_call_sign VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );
air_mission_data_subgroup • CREATE TABLE air_mission_data_subgroup ( iff_sif_mode_and_code NUMBER, fk_air_mission_data NUMBER, root_elm VARCHAR(100), id NUMBER, FOREIGN KEY (fk_air_mission_data) REFERENCES air_mission_data(id), PRIMARY KEY (id) );
air_ref_info • CREATE TABLE air_ref_info ( fk_air_mission_data_group NUMBER, id NUMBER, setid VARCHAR(100) NOT NULL, root_elm VARCHAR(100), FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id) );
air_ref_info_subgroup • CREATE TABLE air_ref_info_subgroup ( mission_number VARCHAR(100), id NUMBER, tanker_assignment VARCHAR(100), refueling_fuel_type VARCHAR(100), total_offload_fuel_thou_lb NUMBER, num_type_aircraft_model VARCHAR(100), fk_air_ref_info NUMBER, root_elm VARCHAR(100), air_refueling_control_day_time VARCHAR(100), aircraft_call_sign VARCHAR(100), FOREIGN KEY (fk_air_ref_info) REFERENCES air_ref_info(id), PRIMARY KEY (id) );
mission_location • CREATE TABLE mission_location ( mission_stop_day_time VARCHAR(100), area_coor_lat_long_min VARCHAR(100), id NUMBER, air_support_request_number VARCHAR(100), fk_air_mission_data_group NUMBER, point_name VARCHAR(100), flddesc VARCHAR(100), root_elm VARCHAR(100), mission_location_name VARCHAR(100), mission_start_day_time VARCHAR(100), altitude_in_hundreds_of_feet NUMBER, setid VARCHAR(100) NOT NULL, FOREIGN KEY (fk_air_mission_data_group) REFERENCES air_mission_data_group(id), PRIMARY KEY (id));