1 / 1

Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara

Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara. Apurva Narechania 1 , Joshua Stein 1 , William Spooner 1 , Sharon Wei 1 , Ben Faga 1, Shiran Pasternak 1 , and Doreen Ware 1, 2 1 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA

elgin
Download Presentation

Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward a Better Understanding of Cereal Genome Evolution Through Ensembl Compara Apurva Narechania1, Joshua Stein1, William Spooner1, Sharon Wei1, Ben Faga1, Shiran Pasternak1, and Doreen Ware1, 2 1 Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY11724, USA 2 USDA-ARS NAA Plant, Soil & Nutrition Laboratory Research Unit, USA Blastz-NET Alignment Stats (Maize Accelerated Region) Syntenic Blocks Between Maize, Rice, and Sorghum Summary Region Statistics The maize genome has been largely shaped by its history of tetraploidization, subsequent rearrangement and duplicate gene loss. Disruption of synteny has also resulted from apparent gene movement in both maize and sorghum relative to rice. Many questions remain concerning the evolution of cereals, including the extent of lineage-specific rearrangements, selective forces that dictated the retainment of duplicate genes, and the extent of conserved non-coding regions. The availability of three nearly complete cereal genomes (maize, rice and sorghum) provides an unprecedented opportunity to use comparative genomics to answer these and other questions in the evolution of plant genomes. As part of the Maize Genome Sequencing Project, we describe the use of the Ensembl Compara whole genome alignment pipeline to construct sequence-based syntenies. The pipeline automates pairwise whole genome analysis by parallelizing the construction of blastz alignments, their subsequent consolidation into chains and nets, and their coalescence into syntenic regions. The algorithms employed identify highly similar regions between two large sequences while allowing for segments without similarity, thus highlighting gene movement or genomic rearrangement within syntenic blocks. The tetraploid nature of maize and its history of whole genome duplications suggest that much of its genome should have at least two blocks that align to the same region of rice. Preliminary analysis using a pilot 22 megabase maize assembly spanning maize chromosome 4 exhibits synteny to a comparably sized region on rice chromosome 2. In agreement with marker-based syntenic studies, we show that this rice chromosome has a duplicate homelogue on maize chromosome 5. We address the challenges of applying this pipeline to the maize genome in its partially assembled state. Region Alignment Statistics Blastz-NET coverage by NET Level Blastz-CHAIN-NET and the Ensembl Hive • Alignable Sequence refers to the portion of the maize accelerated region that is of high quality and has not been RepeatMasked. • Sorghum blastz-NETs align 66% of the alignable maize sequence, while rice aligns 35% of the available accelerated region. CreateAlignmentChainsJobs SubmitGenome ChunkAndGroupDNA AlignmentChains AlignmentChains Blastz-NET coverage by Rice Chromosome CreatePairAlignerJobs UpdateMaxAlignmentLength Blastz Blastz Blastz CreateAlignmentNetsJobs FilterDuplicates AlignmentNets AlignmentNets • The maize accel region contains syntenic blocks to rice chr2 and sorghum chr4 • Maize: max gap between NETS 100,000 residues; min NET size 5000 residues. • Rice and sorghum: max NET gap 50,000 residues; min NET size 2000 residues. • Syntenic blocks are defined in two steps. First, NETS are grouped if the distance between them is smaller than twice the max gap parameter and there are no NETS breaking the synteny. Second, these groups are arranged into syntenic blocks up to 30 times the max gap parameter with two synteny breaking groups allowed. • The rice assembly is complements of TIGR (version 5), and early access to the sorghum assemblies complements of JGI. UpdateMaxAlignmentLength UpdateMaxAlignmentLength • The Blastz-CHAIN-NET pipeline creates long range gapped pairwise blastz chains and nets from raw blastz alignments thereby allowing for genomic rearrangements in syntenic regions. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11188-9. • The Ensembl Hive pipeline parallelizes the generation of blastz alignments and their consolidation into chains and nets using a hive system that creates specific jobs and spawns anonymous, general workers to complete those jobs. Nucleic Acids Res. 2008 Jan;36(Database issue):D707-14. Blastz-NET coverage by Sorghum Chromosome Maize BAC-contigs versus Rice at MaizeSequence.org Maize Accelerated Region Duplication • In its partially assembled state, the longest contiguous regions at maizesequence.org are the BAC contigs. • Whole genome alignments to rice for all BAC contigs are available and correspond well to FgenesH predictions with similarity to known proteins and maize ESTs. • Rice Chr2 from positions 29MB to 36MB aligns to Maize Chromosomes 4 and 5 in equal measure indicating a duplication event. Alignments were made to maize BAC-contigs and mapped to Chromosomes 4 and 5 using the FPC map. • The majority of Chr4 hits were on FPC ctg182, corresponding to the accelerated region. The majority of NETS on Chr5 were on contigs 250, 251, 253, and 254 in agreement with marker based studies. PLoS Genet. 2007 Jul 20;3(7):e123 • The majority of Blastz-NETS cluster on rice chromosome 2 and sorghum chromosome 4 in agreement with known marker based synteny. Proc Natl Acad Sci U S A. 2005 Sep 13;102(37):13206-11. Gene Predictions Associated with Blastz-NETs Distribution of blastz-NET sizes for Rice and Sorghum Alignments • 39% of maize genes within syntenic blocks are non-syntenic, suggesting substantial gene movement within maize. • Almost 50% of rice genes are non-syntenic, possibly due to loss of duplicate genes w/in maize homeologous regions. Rice Stats Sorghum Stats Rice and Sorghum Level 1/2 Distributions • Methods: • Syntenic blocks were defined using from BLASTZ-Chain-Net data using parameters MaxDist and MinDist as described in the synteny views above. • Genes (excluding TE’s) were counted as syntenic if they overlapped a chain HSP that contributed to the synteny. • Blastz-NET lengths are defined as the number of aligning bases in a NET excluding gaps while blastz-NET spans are the distances from the first to the last base in the NET including gaps. • Level 1 NETS consistently show the longest length and span across species. • Sorghum NETS are considerably longer than those found in rice. • Despite large differences in lengths and spans across levels and species, the overall distributions are similar, highlighting the influence of biologically significant outliers.

More Related