180 likes | 273 Views
S. Montani 1 , G. Leonardi 1 , S. Ghignone 2 , L. Lanfranco 2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy. Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs.
E N D
S. Montani1, G. Leonardi1, S. Ghignone2, L. Lanfranco2 1 Dipartimento di Informatica, University of Piemonte Orientale, Alessandria, Italy 2 Dipartimento di Biologia Vegetale, University of Turin, Italy Flexible genome retrieval for supporting in-silico studies of endobacteria-AMFs
Arbuscular mycorrhizal fungi (AMFs) Obligate symbionts in strict association with roots of land plants In soil: positive impacts on plants health and productivity Often in further symbiosis with bacteria Tripartite system: (i) endobacterium (ii) AMF (iii) plant roots AMF Spore AMF Hypha Endobacteria
Studying the tripartite system Potentially strong practical impacts symbiotic consortia may lead to: new metabolic pathways appearance of interesting molecules for sustainable agricultureand (possibly) for industrial biotechnological applications Comparative genomics approach to infer phylogenetic relationships genome evolution metabolic functions of a given organism(also with few available data) Key part of the study: genomic data of the endobacteria and AMF-endobacteria interaction
A computational environment for AMF-endobacteria interaction Genomic study of the system AMF Gigaspora margarita (isolate BEG34) and of its endobacterium Candidatus Glomeribacter gigasporarum BIOBITS project, Regione Piemonte - Converging Technologies Modular architecture Database Synteny and visualization tools BIOBITS research tools Generic Model Organism Database (GMOD) project: open source tools for creating and managing genome-scale biological databases
Architecture of the system Flexible retrieval
Data storage CHADO DB Bacterial genomes, known annotations, proteins and metabolic pathways, and newly discovered annotations Manually loaded with genomes of Candidatus Glomeribacter’s relatives Import modules and RRE - Queries information retrieved from the biological databases accessible through the Internet (e.g. GenBank)
Data visualization GMOD customizable modules for comparative genomics CMap allows to view comparisons of genetic and physical maps GBrowse_syn is a synteny browser to display multiple genomes, with a central reference species SyBil is a system for comparative genomics visualizations
New applications (BIOBITS research tools) Biomart-based tools reorganizes the information into a data warehouse analyzes the data by means of clustering and data mining techniques Flexible retrieval tool Case-based reasoning paradigm
Case-based retrieval • retrieve past cases similar to the current one • reuse past successful solutions after, if necessary, properly • revising them • retain the current case
Case representation Sequence of nucleotides, properly aligned with the same reference organism Percentage of similarity with the aligned nucleotide in the reference organism
Flexible retrieval Abstracting the data at different levels in a taxonomy “Bird’s eye” view of similarity • Example: • DCW region (cellular division) • About 10 genes • Region conserved in relatives • a single gene may not
Flexible retrieval Abstracting the data at different “states” granularity levels Similar to the (state) Temporal Abstraction technique: from points to intervals sharing a common persistent behavior Each state specialized in further subdivisions
Efficient retrieval Multi-dimensional index structures Queries at any level of detail Interactivity
Query answering Query: similarity string at any detail level (Hv..Hv) Query generalization to find index root Hv..Hv -> H..H -> H Index navigation backwards respect to query generalization steps
Computation time • Efficient retrieval particularly critical in very large databases (bacteria genome DBs growing very fast) • Existing implementation in the haemodialysis domain • 1475 real haemodialysis patients cases • Fast index-based TA is (41 msec on Intel Core 2 Duo T9400 processor running at 2.53 GHz, equipped with 4 Gb of DDR2 ram)
Conclusions Modular architecture for in-silico comparative genomics studies of AMF-endobacteria interaction Flexible genome retrieval tool Flexible query definition, at different levels of abstractions Efficient index-based retrieval Interactive query refinement/generalization
Future work Complete tool implementation Experiments on RefSeq NCBI data Tool usability New applications published as new GMOD modules