860 likes | 1.13k Views
Exploring and utilization of the genetic variation within the gene pool of modern crop species is a critical step in maintaining and improving crop productivity. The genetic variation ranging from SNPs to large structural variation can result in variation in the gene content between individuals of the same species. The pan-genome concept was proposed to better capture this variation; as single reference genome is insufficient to capture the complete genetic diversity of a species.
E N D
UNIVERSITY OF AGRICULTURAL SCIENCES, BANGALORE DEPARTMENT OF GENETICS AND PLANT BREEDING I Doctoral Seminar PRESENTED BY- MARUTHI PRASAD B P ID. No. - PAMB 1066
POPULATION INCREASE!!!! CLIMATE CHANGE!!!! • Capturing maximum genetic variation • Time & Accuracy • Understanding genome of crop PEST AND DISEASE OUTBREAK!!!! 3
Reference genome - Tool which serve as a base for crop improvement ngs Numerous sequencing efforts have been undertaken in plants and, as a result, reference genome sequences have become available for several crops, which serve as a base for crop improvement efforts. Tao et al. (2019) 4
Single reference genome is adequate? SINGLE-REFERENCE GENOME Single reference genome oriented Comparative genome analysis What if our reference genome is incomplete to capture whole information's ? 5
Limitations of using single reference genome Dynamic nature of the genome Incomplete representation of genetic diversity Biases in the reference genome 6
BEYOND A SINGLE REFERENCE GENOME !!! The move from a single reference genome to multiple reference genomes will better illuminate the mining of genetic diversity for crop improvement by providing a more precise and comprehensive guiding principle Bayer et al. (2020) 7
APPROACHES, APPLICATIONS AND RECENT DEVELOPMENTS IN PANGENOMICS Looking beyond the single reference genome 8
What is Pangenome? Genomic data derived from multiple accessions and cultivars Full extent of sequence variations within a species PAN-GENOMIC approach to figure out new genes and alleles directly related to phenotype “A pangenome refers to the full complement of genes of a biological clade, such as a species, which can be partitioned into a set of core genes that are shared by all individuals and a set of dispensable genes that are partially shared or individual specific.” 10
When and Who ? Herve Tettelin • Pangenomes were first introduced by Tettelinet al., in 2005to describe gene diversity in Streptococcus agalactiae DuccioMedini • Pangenomics in plants was first proposed by Morgante et al. (2007) Michele Morgante 14
Timeline of Developments in Pangenomic Research Review of analytical tool and model developed over 10 years of pangenome research (Vernikoset al., 2015) E. Coli pangenome built using 1085 genomes Rice accessory genome characterized Human pangenome Bacterial upper kingdom pangenome The pangenome introduced by Tettelin Human pangenome Pig pangenome Pangenome of phytoplankton Emiliania huxleyi Plant pangenome concept proposed by Morgante et al. Pangenome of bread wheat and stiff brome 2009 2008 2013 2014 2018 2017 2019 2015 2006 2016 2007 Brassica oleracea pangenome Poplar genome Soybean and wild relatives pangenome Maize transcriptome Rice pangenome built using 3010 accessions Saccharomyces cerevisiae pangenome built using 10 isolates Streptococcus pneumoniae pangenome Escherichia coli pangenome Goliczet al. (2019) 15
Timeline of Developments in Pangenomic Research First graph-based plant pan-genome was constructed in soybean ‘‘map-to-pan’’ strategy 16
MAJOR DRIVING FORCES FOR SVs UNDERLYING THE VARIABLE SEQUENCES OF PLANT PAN-GENOMES 17
1.Transposable elements Insertion of TEs in regulatory regions (tb1, vgt1, ZmCCT10, and ZmCCT9 ) 18
How a Pangenome is generated? 22 • Li et al. (2022)
Pan-genome workflow 2. Identification of Genetic Variation 1. Selection of Germplasm 3. Genotyping 4. Linking Genotypic &Phenotypic variation 23
Sequencing NGS Technology TGS Technology 25
Assembly errors Missing gene 26
PAN GENOME ASSEMBLY METHODS Iterative assembly Graph assembly De novo assembly 27
DE NOVO ASSEMBLY De Novo Assembly • Error prone • Costly • high-quality data with high sequencing coverage is required 28
De Novo genome assembly 1. Short/Long reads 2. Contig assembly 3. Scaffold/Chromosome assembly 4. Multiple alignment of genomic regions 5. Pan-genome construction 29
ITERATIVE MAPPING • Less expensive • It requires much less data • Permits the assessment of large numbers of individuals with relatively low sequencing coverage. Changes in the gene order 30
ITERATIVE MAPPING Reference genome Mapping of the reads to the reference sequence Assembly of the unmapped reads Building pangenome 31
GRAPH BASED ASSEMBLY • Graph structure to represent the diversity of genomic sequences • Presents variation across multiple genomes as different paths along a graph of sequence or variant nodes 32
Ratio of Core vs Dispensable genes A higher ploidy and outcrossing rate provides extra level of diversity and therefore a larger pangenome with higher percentage of dispensable genes. 36
Case study 1 Varshney, R. K. et al. (2021) Objective: To construct a chickpea pan-genome which provide insights into species divergence, the migration of the cultigen (C. arietinum) and identification of rare allele burden and fitness loss in chickpea. 37
Results Chickpea pan-genome (592.58 Mb) developed using an iterative mapping and assembly approach. A total of 29,870 genes were identified, of which 1,582 were to our knowledge novel compared to previously reported genes. Gene ontology (GO) annotations identified genes that encode response to oxidative stress, response to stimulus, heat shock protein, cellular response to acidic pH and response to cold, suggesting a possible role in adaptation. The modeling curve analysis showed that chickpea pan-genome is closed 38
Cultivated (2,258) and C. reticulatum (22) accessions were analysed to discover structural variations, as compared to the CDC Frontier genome. • More structural variations in the C. reticulatumaccessions because of their high divergence from cultivated chickpea. • They further identified 793 gene-gain copy number variants (CNVs) and 209 gene-loss CNVs in cultivated accessions, and 643 gene-gain and 247 gene-loss CNVs in C. reticulatum accessions. 39
1. Chickpea experienced a strong bottleneck beginning around 10,000 years ago 2. The population size reaching its minimum around 1,000 years ago • 3. Followed by a very strong expansion of the population within the last 400 years, suggest a strong recent expansion of chickpea agriculture. Reconstructed the past history of effective size of chickpea population using 150 randomly chosen cultivated genotypes of chickpea using markovian coalescent as implemented in SMC++ (Terhorst et al., 2017). 40
Neighbour-joining tree constructed indicates a clear out-grouping of wild species accessions from cultivated accessions • The cultivated accessions formed three distinct clusters • One landrace from East Africa (ICC 16369) grouped together with wild species accessions indicating that it is mislabeled as belonging to the cultivated chickpea 41
Conclusions from this study They constructed a chickpea pan-genome and identified the novel genes which are not reported earlier Divergence tree constructed allowed them to estimate the divergence of cicer over the last 21 million years Identified selective sweeps of genes under domestication & bottleneck leading to reduced genetic diversity 42
Case study 2 2021 2021 Objective: • To develop a high-quality rice pan-genome of genetically diverse rice accessions through de novo genome assemblies • Demonstration of the impact of structural variation on environmental adaptations and agronomic traits 43
Materials and methods PacBio SMRT sequencing De novo assembled Assemblies were evaluated for completeness using BUSCO (Benchmarking Universal Single-Copy Orthologs) 44
Results • They had built a pan-genome of cultivated rice comprising 66,636 genes. • Distribution analysis showed that 20,374 genes were categorized as ‘‘core genes’’ and46,262 genes were categorized as ‘‘dispensable genes’’ which included 14,609 accession-private genes. • They identified an average 24,469 SVs per accession relative to Nipponbare. 45
Contribution of SVs in rice environmental adaptation OsWAK112d gene, a known negative regulator of blast resistance Two Independent deletions in OsWAK112d gene contributed to environmental adaptation by enhancing blast resistance in rice. Fig 1.Schematic illustrating the deletions of OsWAK112d in the LJ and N22 accessions Fig 2. The distributions of the deletion of OsWAK112d in subpopulations of O. sativa and wild rice population 46
Association of Gene CNVs with variations in agronomic traits In addition to SVs, gCNVs were inferred for 25,549(38.34%) of the protein coding genes in the rice pan genome. CNV of OsVIL1 is likely associated with flowering time and grain number 47
Conclusions from this study • De novo assembly of 31 high-quality genomes for genetically diverse accessions • Pan-genome-scale resources and a graph-based genome reveal hidden SVs andgCNVs • The derived state of O. sativa SVs was inferred using the O. glaberrima assembly • SVs and gCNVshave shaped gene expression profiles and agronomic trait variations 48
APPLICATIONS OF PAN-GENOMES IN PLANT GENETIC STUDIES AND BREEDING 49
1. Pangenomics in Utilizing Crop Wild Relatives (CWRs) 1. Pangenomics in Utilizing Crop Wild Relatives (CWRs) 50