370 likes | 454 Views
Integration of Biological Data (LifeDB). Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State University, Detroit, USA. Outline. Data Integration WebFusion (our previous work) LifeDB (our goal) Research Scopes.
E N D
Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State University, Detroit, USA
Outline • Data Integration • WebFusion (our previous work) • LifeDB (our goal) • Research Scopes
Data Integration Example • Detroit to Bologna air ticket • Alitalia, Italy Airline • Air France • NorthWest Airline • Lufthansa etc.
CheapAir.com / Expedia.com Alitalia Lufthansa Air France Delta myAirFare.com CheapAir.com Expedia.com …… Integration Example cont.
Integration Approaches • Warehouse Integration • Mediator based Integration • Navigational Integration
Warehouse Integration • Materialize data from all sources to local warehouse • Emphasize data translation rather query translation • Advantages: Low network bottleneck, efficient • Disadvantages: reliability in terms of most up to date data, system maintenance
Mediator – based Integration • Concentrates on Query translation • GAV approach and LAV Approach
Mediator Schema S1 S2 S3 S4 GAV Approach • Query reformulation easy, but addition or removal of sources are difficult • Preferred when sources are known an stable
LAV Approach • Query reformulation is difficult but addition or removal of source are easy • Appropriate for large scale ad-hoc integration • ARIADNE, Discovery Link, TAMBIS, KIND etc Mediator Schema S1 S2 S3 S4
Navigational Integration • Some sources provide information that would not/hardly be accessible without point-and-click navigation
WebFusion Dr. Liangyou Chen
DBGET LinkDB KEGG Pathways • Can these be done electronically for a biologist?
1. Input: 103730 2. Press <Pickup Input> button
2. Press [Go] button 1. Press <Next> button
1. Mark the table 2. Press <Pickup Table> Button
Uncheck all • Boxes except 2~6 2. Press the <Update & Redraw> Button
1. Give it a name called: LocusLink 2. Name them as: Link, LocusID, Org, Symbol, Description respectively 3. Select appropriate transformations 4. Press <Update & Redraw> button
DBGET LinkDB KEGG Pathways
1. Select ‘LocusLink’ table 2. Type in ‘LocusLinkQuery’ as a query name 3. Check these fields to display 4. Double click here
1. Select ‘local_gene_ids’ table 2. Select ‘LID’ field 3. Click here (any place)
DBGET LinkDB KEGG Pathways
LifeDB • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • BioFlow (A declarative WorkFlow Language)
Glimpse of BioFlow DNA sequence repositories GeneBankURL FlyBaseURL GeneBank format EMBL format Combine these sequence University of Minnesota Reading Frame Predictor (input_seq : FASTA format, species) Score and predicted DNA region
BioFlow • workflow open_reading_frame; • useontology BioSystems ; • declare found logical, count int; • define data sequences_1 at GeneBankURL as (seq_1 DNA) ; • define tool orf at URL parameter (seq DNA, target organism) results (score int, predicted_region DNA) ; • combine sequences_1, sequences_2 into sequences (seqs); • select seqs, orf (seqs, “drosophila”) from sequences ; Goal is to develop a formal BioFlow language syntax with compositionality, closure property and type safety
Research Scope • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • 7-8 PhD positions • 3-5 years funding