120 likes | 262 Views
XWRAPComposer. Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources. Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan Rocco Georgia Tech. Outline. State of Art Users’ Perspective Technology Perspective Why SDM Technology – XWRAP Composer
E N D
XWRAPComposer Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan Rocco Georgia Tech
Outline • State of Art • Users’ Perspective • Technology Perspective • Why SDM Technology – XWRAP Composer • Users’ Perspective • Technology Perspective • Progress Report and Near Term Deliverables • Related Long Term Research
Today: Simple Query-Based Searching Query Query 3 Query 4 Semantic Web Web Query 2 Query 1 Why Automating Complex Associative Access Tomorrow with SDM Technology Large & Unorganized Document Collections Complex Associative Access is automated (one stop shopping) Complex Associative Access requires experts
Today: Simple Query-Based Searching Query 3 Query 4 Semantic Web Query 2 Web Query 1 Why Automating Complex Associative Access Tomorrow with SDM Technology Characterize Sort Partition Large & Unorganized Document Collections Filter Summarize
Automating Complex Associative Access XWRAPComposer • Wrapper Technology • Workflow Technology • Semantic Web Technology • Service Discovery • Service Selection • Service Composition • Research Issues • Semantic Data Integration, Interoperability • Scalability, High Performance • Trusted Computing, Dependable, Survivable
XWRAPComposer • What is it? • A wrapper generation system that can semi-automatically generate wrappers (info. extraction programs) • capable of accessing multiple scientific Web pages in one shot. • What makes it different from other existing XWRAP tools? • Capable of generating wrappers that extract information from multiple Web pages connected by URLs (page links) and compose them into an integrated XML document • Extremely useful for Automating Complex Associative Access to multiple scientific data sources
CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT Query 3 Query 4 Query 2 Query 1 AA045112 htgs SDM Enabling Technology: XWRAPComposer Existing Wrapper Technology Blast Detail Wrapper Blast Sum Wrapper Sequence Wrapper Seq. Link Wrapper Extracting Data from a single Web Document
CACCTGGAGAAACTTCTGCACTGGCACTGTGTTCCNAGAGCTCCTTCTATGCGTCCCTCC CAAGTGATTTAATTTCAGCTGATTGGACTACGAATTCACAAGGCAGAAAAGTCAAGGTCA TTTGGNATCTGGAGACAGGAGAACTCAAGGAACCNAAAGGACT Query 2 Query 1 AA045112 htgs SDM Enabling Technology: XWRAPComposer WrapperComposerTechnology Blast Wrapper Full Seq Wrapper Extracting Data from Multiple Web Documents
XWRAPComposer: Technical Perspective Given a sequence, list all matching DNAs. Web NCBi Blast Site Blast Query Page Blast Wrapper Blast Format Page Blast Delay Page Blast Summary Page • Interface/Outerface Specification • Composer Script • Multi-page Control Flow Modeling • Data Extraction Workflow Blast Detail Page
Data Source Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper XML Wrapper Data Source XML Wrapper Data Source XML Wrapper Data Source XML Wrapper XWRAP Human Knowledge Code Generator Extraction Rules GUI SDM Center Data Integration Infrastructure User Agent Parameterized Workflow Specification (PWS) WF infeasible: report reason User constraints & parameters Source Capabilities (SC) Binding Patterns Workflow Resolution Service (WRS) DB WF feasible Domain Map/Ontology Workflow Instantiation Service (WIS) Data Registration Services Registration Executable Workflow Plan: “Matt’s WF” Data Integration Agent(s) Data Mediation Service registry and brokering DB External Program Database Access Program Interfacing External Interface Workflow Agent Other I/O Agents Wrapper based Agent Communication Protocol Gateway Wrapper based Agent Wrapper based Agent Other Agents (e.g., VIPAR) User (Matt) Data Sources
Progress Report • Status • Produced Three Deliverables • Composer Interface/Outerface Specification • Five Java Wrappers for Pilot Scenario • Composer Script Examples for Pilto Scenario • XWRAPComposer design and development • Near Term Plan • Finish the design of XWRAP Composer scripting language ( Nov. 2002) • Develop the first prototype of XWRAP Composer system (Jan. 2003) • Performance Evaluation (March. 2003)
Related Long Term Research • Semantic Web and Semantic Data Integration • Service Discovery • dynamic content crawler • Service Selection • Adaptive query routing • Service Composition • Infopipe Technology