1 / 18

Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs

S. Montani 1 , G. Leonardi 1 , S. Ghignone 2 , L. Lanfranco 2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy. Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs.

ciel
Download Presentation

Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S. Montani1, G. Leonardi1, S. Ghignone2, L. Lanfranco2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs

  2. Arbuscular mycorrhizal fungi (AMFs)‏ Obligate symbionts in strict association with roots of land plants In soil: positive impacts on plants health and productivity Often in further symbiosis with bacteria Tripartite system: (i) endobacterium (ii) AMF (iii) plant roots AMF Spore AMF Hypha Endobacteria

  3. Studying the tripartite system Potentially strong practical impacts symbiotic consortia may lead to: new metabolic pathways appearance of interesting molecules for sustainable agricultureand (possibly) for industrial biotechnological applications Comparative genomics approach to infer phylogenetic relationships genome evolution metabolic functions of a given organism(also with few available data)‏ Key part of the study: genomic data of the endobacteria and AMF-endobacteria interaction

  4. A computational environment for AMF-endobacteria interaction Genomic study of the system AMF Gigaspora margarita (isolate BEG34) and of its endobacterium Candidatus Glomeribacter gigasporarum BIOBITS project, Regione Piemonte - Converging Technologies Modular architecture Database Synteny and visualization tools BIOBITS research tools Generic Model Organism Database (GMOD) project: open source tools for creating and managing genome-scale biological databases

  5. Architecture of the system Flexible retrieval

  6. Data storage CHADO DB Bacterial genomes, known annotations, proteins and metabolic pathways, and newly discovered annotations Manually loaded with genomes of Candidatus Glomeribacter’s relatives Import modules and RRE - Queries information retrieved from the biological databases accessible through the Internet (e.g. GenBank)

  7. Data visualization GMOD customizable modules for comparative genomics   CMap allows to view comparisons of genetic and physical maps GBrowse_syn is a synteny browser to display multiple genomes, with a central reference species SyBil is a system for comparative genomics visualizations

  8. New applications (BIOBITS research tools)‏ Biomart-based tools reorganizes the information into a data warehouse analyzes the data by means of clustering and data mining techniques Flexible retrieval tool Case-based reasoning paradigm

  9. Case-based retrieval • retrieve past cases similar to the current one • reuse past successful solutions after, if necessary, properly • revising them • retain the current case

  10. Case representation Sequence of nucleotides, properly aligned with the same reference organism Percentage of similarity with the aligned nucleotide in the reference organism

  11. Case representation

  12. Flexible retrieval Abstracting the data at different levels in a taxonomy “Bird’s eye” view of similarity • Example: • DCW region (cellular division) • About 10 genes • Region conserved in relatives • a single gene may not

  13. Flexible retrieval Abstracting the data at different “states” granularity levels Similar to the (state) Temporal Abstraction technique: from points to intervals sharing a common persistent behavior Each state specialized in further subdivisions

  14. Efficient retrieval Multi-dimensional index structures Queries at any level of detail Interactivity

  15. Query answering Query: similarity string at any detail level (Hv..Hv) Query generalization to find index root Hv..Hv -> H..H -> H Index navigation backwards respect to query generalization steps

  16. Computation time • Efficient retrieval particularly critical in very large databases (bacteria genome DBs growing very fast) • Existing implementation in the haemodialysis domain • 1475 real haemodialysis patients cases • Fast index-based TA is (41 msec on Intel Core 2 Duo T9400 processor running at 2.53 GHz, equipped with 4 Gb of DDR2 ram)

  17. Conclusions Modular architecture for in-silico comparative genomics studies of AMF-endobacteria interaction Flexible genome retrieval tool Flexible query definition, at different levels of abstractions Efficient index-based retrieval Interactive query refinement/generalization

  18. Future work Complete tool implementation Experiments on RefSeq NCBI data Tool usability New applications published as new GMOD modules

More Related