1 / 37

Integration of Biological Data (LifeDB)

Integration of Biological Data (LifeDB). Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State University, Detroit, USA. Outline. Data Integration WebFusion (our previous work) LifeDB (our goal) Research Scopes.

Download Presentation

Integration of Biological Data (LifeDB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State University, Detroit, USA

  2. Outline • Data Integration • WebFusion (our previous work) • LifeDB (our goal) • Research Scopes

  3. Data Integration Example • Detroit to Bologna air ticket • Alitalia, Italy Airline • Air France • NorthWest Airline • Lufthansa etc.

  4. CheapAir.com / Expedia.com Alitalia Lufthansa Air France Delta myAirFare.com CheapAir.com Expedia.com …… Integration Example cont.

  5. Integration Approaches • Warehouse Integration • Mediator based Integration • Navigational Integration

  6. Warehouse Integration • Materialize data from all sources to local warehouse • Emphasize data translation rather query translation • Advantages: Low network bottleneck, efficient • Disadvantages: reliability in terms of most up to date data, system maintenance

  7. Mediator – based Integration • Concentrates on Query translation • GAV approach and LAV Approach

  8. Mediator Schema S1 S2 S3 S4 GAV Approach • Query reformulation easy, but addition or removal of sources are difficult • Preferred when sources are known an stable

  9. LAV Approach • Query reformulation is difficult but addition or removal of source are easy • Appropriate for large scale ad-hoc integration • ARIADNE, Discovery Link, TAMBIS, KIND etc Mediator Schema S1 S2 S3 S4

  10. Navigational Integration • Some sources provide information that would not/hardly be accessible without point-and-click navigation

  11. WebFusion Dr. Liangyou Chen

  12. DBGET LinkDB KEGG Pathways • Can these be done electronically for a biologist?

  13. Go to: http://www.ncbi.nlm.nih.gov/LocusLink/

  14. Click <Register Web Process> menu

  15. 1. Input: 103730 2. Press <Pickup Input> button

  16. 2. Press [Go] button 1. Press <Next> button

  17. 1. Mark the table 2. Press <Pickup Table> Button

  18. Press the <Create> Button

  19. Uncheck all • Boxes except 2~6 2. Press the <Update & Redraw> Button

  20. 1. Give it a name called: LocusLink 2. Name them as: Link, LocusID, Org, Symbol, Description respectively 3. Select appropriate transformations 4. Press <Update & Redraw> button

  21. Press <Confirm & Create Table>

  22. LocusLink web process is created

  23. DBGET LinkDB KEGG Pathways

  24. 1. Select ‘LocusLink’ table 2. Type in ‘LocusLinkQuery’ as a query name 3. Check these fields to display 4. Double click here

  25. 1. Select ‘local_gene_ids’ table 2. Select ‘LID’ field 3. Click here (any place)

  26. Click <This Query> button

  27. Press <Execute> button

  28. Here shows in progress results

  29. LifeDB

  30. DBGET LinkDB KEGG Pathways

  31. LifeDB • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • BioFlow (A declarative WorkFlow Language)

  32. Glimpse of BioFlow DNA sequence repositories GeneBankURL FlyBaseURL GeneBank format EMBL format Combine these sequence University of Minnesota Reading Frame Predictor (input_seq : FASTA format, species) Score and predicted DNA region

  33. BioFlow • workflow open_reading_frame; • useontology BioSystems ; • declare found logical, count int; • define data sequences_1 at GeneBankURL as (seq_1 DNA) ; • define tool orf at URL parameter (seq DNA, target organism) results (score int, predicted_region DNA) ; • combine sequences_1, sequences_2 into sequences (seqs); • select seqs, orf (seqs, “drosophila”) from sequences ; Goal is to develop a formal BioFlow language syntax with compositionality, closure property and type safety

  34. Research Scope • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • 7-8 PhD positions • 3-5 years funding

  35. Thanks to all

More Related