1 / 1

ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS,

(alignment). BLAST output. lettuce . sunflower . Arabidopsis . alignment summary . ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore

blythe
Download Presentation

ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. (alignment) BLAST output lettuce  sunflower  Arabidopsis  alignment summary  ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore University of California, Davis, Dept. of Vegetable Crops, Davis, CA 95616, USA Approximately 3,700 of the genes in the Arabidopsis Col-0 genome are single copy. These genes were used to identify conserved orthologs in several other plant species. Using computational approaches we identified 1104 lettuce, 686 sunflower, 1704 tomato, 2016 soybean, 1701 maize and 1290 rice ESTs that are conserved orthologs to these Arabidopsis genes. Each EST sequence from these sets has an unambiguous single strong BLAST hit to the Arabidopsis genome. Reciprocal BLAST searches (Arabidopsis single copy genes versus EST assemblies) showed that more than 80% of BLAST hits had only a single strong hit. It indicated that the majority of these conserved orthologs are represented by single genes in multiple plant species. The total number of Arabidopsis genes that have similarity (BLAST score 1e-20 or better) to at least one of these selected ESTs is 2205, which is 60% of total number of single copy genes in Arabidopsis. Only 248 sequences were in common between EST collections from different species and Arabidopsis single copy genes. This can be partially explained by the incomplete representation within each EST collection. Analysis and visualization of single copy genes over Arabidopsis chromosomes (http://cgpdb.ucdavis.edu/COS_Arabidopsis/arabidopsis_single_copy_genes_2003.html) revealed that these genes were distributed throughout the genome regardless of large scale chromosomal duplications. This indicates that deduction of order of genes in common ancestors is required for informative analyses of synteny. SINGLE COPY ORTHOLOGS SUMMARY PIPELINE TO IDENTIFY SINGLE COPY ORTHOLOGS BLAST search of selected ESTs versus all Arabidopsis predicted proteins and selection of ESTs with a single strong hit to Arabidopsis genome (Exp cutoff 1e-20) [step 3] Arabidopsis predicted proteins (27,169 seqs) lettuce ESTs (68,197 seqs) sunflower ESTs (67,180 seqs) BLAST search Arabidopsis proteins against themselves and selection of Arabidopsis single copy genes [step 1] BLAST search of Arabidopsis single copy genes versus full sets of ESTs selection of ESTs with BLAST hits to Arabidopsis single copy subset [step 2] tomato ESTs (113,932 seqs) Arabidopsis single copy genes (3,714 seqs) soybean ESTs (341,564 seqs) maize ESTs (362,510 seqs) rice ESTs (107,329 seqs) Raw data and detailed description of the sequence extraction pipeline is available at: http://cgpdb.ucdavis.edu/COS_Arabidopsis/ PIPELINE TO EXTRACT ALIGNMENTS AT NUCLEOTIDE LEVEL GenBank files of Arabidopsis genome (DNA sequences of entire chromosomes and corresponding annotation) tab-delimited file with info about BLAST alignments (start points and end points for each sequence in BLAST report) BLAST parser (Tcl/Tk script) [step 4] GenBank Parser SeqsExtractorFromBlastX (Python script) [step 1] spliced DNA sequences corresponding to ORFs [step 5] final step of the pipeline: BLASTX search [ESTs vs proteins] translation extraction of DNA sequences corresponding to BLAST alignments from “spliced DNA” (subject) and EST (query) files. Script automatically counts codon usage. Output: spreadsheet with info about codon usage [step 2] [step 3] translated (protein) sequences [subject] ESTs (unigene) set [query] http://cgpdb.ucdavis.edu/COS_Arabidopsis/Codon_Usage_Pipeline.html MULTIPLE ALIGNMENT VISUALIZED WITH TkLife ( http://www.atgc.org/TkLife/ ) Graphical representation of BLAST search of lettuce, sunflower, tomato, soybean, maize and rice ESTs against Arabidopsis genome. The picture displays potential conserved orthologs (single copy genes in Arabidopsis). Each box (element) is a single copy Arabidopsis gene having homology to selected sets of plant ESTs. Genes are plotted along five Arabidopsis chromosomes according to their physical positions. codon match (and amino acid match) codon mismatch and amino acid match (synonymous substitutions) codon mismatch and amino acid mismatch (non-synonymous substitutions) Segmental duplication between Arabidopsis chromosomes 4 and 5 Color Scheme: Black - single copy genes Purple - kinases Green - cytochrome Red - resistance genes Yellow - ribosomal proteins Gray lines connect genes with sequence identity 40% or greater Note: Single copy genes are distributed evenly through both segments of the duplicated region. Image was generated by GenomePixelizer using the “locus zoomer” function. Additional information is available at: http://www.atgc.org/GP_Ref/presentation/ Credits: This work was funded by USDA IFAFS Plant Genome Program to the Compositae Genome Project Questions and comments to Alexander Kozik, email: akozik@atgc.org CHRM 4 CHRM 5 Putative scenario of gene loss after segmental duplication Because of extensive gene loss after duplication, deduction of gene order in ancestral genomes is required for informative synteny analysis between different genomes. Patterns of segmental duplications in Arabidopsis genome (generated by GenomePixelizer http://www.atgc.org/). Regions selected by white boxes are shown in large scale above.

More Related