610 likes | 627 Views
Understanding genetic analysis steps, from gene mapping to sequencing and molecular tools, to decode the genome architecture of organisms. Discover the role of restriction enzymes, libraries, and DNA amplification in exploring the genetic landscape.
E N D
Steps in Genetic Analysis • Knowing how many genes determine a phenotype (Mendelian and/or QTL analysis), and where the genes are located (linkage mapping) is a first step in understanding the genetic basis of a phenotype • A second step is determining the sequence of the gene (or genes)
Steps in Genetic Analysis • Subsequent steps involve…. • Understanding gene regulation • Understanding the context of the gene in the sequence of the whole genome • Analysis of post-transcriptional events, understanding how the genes fit into metabolic pathways, how these pathways interact with the environment
Genome sequence of a diploid plant (2n = 2x = 14) • 5,300,000,000 base pairs • 165% of human genome • Enough characters for 11,000 large novels • 60,000,000 base pairs of expressed sequence • ~ 1% of total sequence, like humans • 125 large novels
Molecular tools for determining DNA (or RNA) sequence • Genomic DNA (or RNA) extraction • (RNA cDNA) • Manipulate DNA with restriction enzymes to reduce complexity and/or facilitate further manipulation • Manage and/or maintain DNA in vectors and/or libraries • Selecting DNA targets via amplification and/or hybridization • Determining nucleotide sequence of the targeted DNA
Extracting DNA (orRNA) • Genomic DNA (or RNA): • Leaf segments or target tissues • Key considerations are • Concentration • Purity • Fragment size
mRNA to cDNA Reverse transcriptase
Restriction Enzymes • Restriction enzymes make cuts at defined recognition sites in DNA • A defense system for bacteria, where they attack and degrade the DNA of attacking bacteriophages • The restriction enzymes are named for the organism from which they were isolated • Harnessed for the task of systematically breaking up DNA into fragments of tractable size and for various polymorphism detection assays • Each enzyme recognizes a particular DNA sequence and cuts in a specified fashion at the sequence
Restriction Enzymes • Recognition sites and fragment size: a four-base cutter ~ 256 bp (44); more frequently than a six-base cutter, which in turn will cut more often than one with an eight base-cutter • Methylation sensitive restriction enzymes • Target the epigenome
Restriction Enzymes • Palindrome recognition sites – the same sequence is specified when each strand of the double helix is read in the opposite direction. • Sit on a potato pan, Otis • Cigar? Toss it in a can, it is so tragic • UFO. tofu • Golf? No sir, prefer prison flog • Flee to me remote elf • Gnu dung • Lager, Sir, is regal • Tuna nut • CRISPR: Clustered regularly interspaced palindromic repeats
Vectors • Propagate and maintain DNA fragments generated by the restriction digestion • Efficiency and simplicity of inserting and retrieving the inserted DNA fragments • Key feature of the cloning vector: size of the DNA insert • Plasmid ~ 1 kb • BAC ~ 200 kb
Libraries Repositories of DNA fragments cloned in their vectors or attached to platform-specific oligonucleotide adapters Classified in terms of • cloning vector: e.g. plasmid, BAC • in terms of cloned DNA fragment source: e.g. genomic, cDNA • In terms of intended use: e.g. next generation sequencing (NGS)
Genomic DNA libraries • Total genomic DNA digested and the fragments cloned into an appropriate vector or system • Representative sample all the genomic DNA present in the organism, including both coding and non-coding sequences • Enrichment strategies: target specific types of sequences • unique sequences • Methylated sequences
cDNA libraries • Generated from mRNA transcripts, using reverse transcriptase • The cDNA library represents only the genes that are expressed in the tissue and/or developmental stage that was sampled
DNA Amplification: Polymerase Chain Reaction (PCR) • K.B Mullis, 1983 • in vitro amplification of ANY DNA sequence https://www.youtube.com/watch?v=2KoLnIwoZKU
Synthetic DNA: oligonucleotides • Primers, adapters, and more …~$0.010 per bp... • < ~ 100 bases
DNA Amplification: Polymerase Chain Reaction (PCR) Design of two single stranded oligonucleotide primers complementary to motifs on the template DNA.
DNA Amplification - PCR A Polymerase extends the 3’ end of the primer sequence using the DNA strand as a template.
PCR Principles • The PCR reaction consists of: • Buffer • DNA polymerase (thermostable) • Deoxyribonucleotide triphosphates (dNTPs) • Two primers (oligonucleotides) • Template DNA
PCR Principles • Each cycle generates exponential numbers of DNA fragments that are identical copies of the original DNA strand between the two binding sites.
PCR Principles • The choice of what DNA will be amplified by the polymerase is determined by the primers • The DNA between the primers is amplified by the polymerase: in subsequent reactions the original template, plus the newly amplified fragments, serve as templates • Steps in the reaction include denaturing the target DNA to make it single-stranded, addition of the single stranded oligonucleotides, hybridization of the primers to the template, and primer extension
PCR Applications • Amplify a target sequence from a pool of DNA (your favorite gene, forensics, fossil DNA) • Start the process of genome sequencing • Generate abundant markers for linkage map construction molecular markers
DNA Hybridization • Single strand nucleic acids find and pair with other single strand nucleic acids with a complementary sequence • An application of this affinity is to label one single strand and then to use this probe to find complementary sequences in a population of single stranded nucleic acids • For example, if you have a cloned gene – either a cDNA or a genomic clone - you could use this as a probe to look for a homologous sequence in another DNA sample • The microarray concept: • https://www.dnalc.org/resources/3d/26-microarray.html
DNA Hybridization The principle of hybridization can be applied to pairing events involving DNA: DNA; DNA: RNA; and protein: antibody Southern blot Northern blot Western blot
Sanger DNA Sequencing - classic but not obsolete The gold standard for accurate sequencing of short (~ 650 bp) DNA sequences https://www.dnalc.org/view/15479-Sanger-method-of-DNA-sequencing-3D-animation-with-narration.html
Next Generation Sequencing - Illumina A tremendous amount of short read data in a short time, at a good price https://www.youtube.com/watch?annotation_id=annotation_228575861&feature=iv&src_vid=womKfikWlxM&v=fCd6B5HRaZ8
Sequencing - PAC Bio Long reads! https://www.youtube.com/watch?v=v8p4ph2MAvI
Sequencing - Nanopore Potentially, cheap, fast, portable
Sequencing considerations Read length Accuracy Speed Cost Assembly
Genome sizes and whole genome sequencing Credit: Karl Kristensen, Denmark
Fragariavesca Sequencing a plant genome • Herbaceous, perennial • 2n=2x=14 • 240 Mb • Reference species for Rosaceae • Genetic resources Credit: commons.Wikimedia.org Fragaria x ananassa: 2n=8x=56. Domesticated 250 years ago
Sequencing a plant genome • Short reads • No physical reference • De novo assembly • Open source
Sequencing a plant genome • Roche 454, Illumina • X39 coverage (number of reads including a given nucleotide) • Contigs (overlapping reads) assembled into scaffolds (contigs + gaps) • ~ 3,200 scaffolds N50 of 1.3 Mb (weighted average length) • Over 95% (209.8 Mb) of total sequence is represented in 272 scaffolds
Anchoring the genome sequence to the genetic map Sequencing a plant genome • 94% of scaffolds anchored to the diploid Fragariareference linkage map using 390 genetic markers • Pseudochromosomes ~ linkage groups
Sequencing a plant genome Synteny Homologs Orthologs Paralogs Credit: Biology stackexchange.com
Sequencing a plant genome The small genome size (240 Mb) • Absence of large genome duplications • Limited numbers of transposable elements, compared to other angiosperms
Sequencing a plant genome the transcriptome • Fruits and roots – different types of genes
Sequencing a plant genome Gene prediction • 34,809 nuclear genes • flavor, nutritional value, and flowering time • 1,616 transcription factors RNA genes • 569 tRNA, 177 rRNA, 111 spliceosomal RNAs, 168 small nuclear RNAs, • 76 micro RNA and 24 other RNAs Chloroplast genome • 155,691 bp encodes 78 proteins, 30 tRNAs and 4 rRNA genes • Evidence of DNA transfer from plastid genome to the nuclear genome
Genome architecture and evolution • Key considerations: • Genes • Chromosomes • C value paradox • Gene regulation • Epigenetics • Transposable elements
Genome architecture and evolution S. Wessler on Transposable elements: https://www.youtube.com/watch?v=_cJfsWYR42M Review of retroviruses: https://www.youtube.com/watch?v=OxwKuFZbDzc Review of cut and past transposition: https://www.youtube.com/watch?v=XYZHMGUGq6o
Genes DNA ……..……..mRNA……….…….Protein Transcription tRNA rRNA Translation Other RNAs?
Genes (classically..) DNA specifying a protein 200 – 2,000,000 nt (bp) promoter Coding region • Exon • Intron • Exon • Intron Exon • Start • codon • Stop • codon • 3’UTR • 5’UTR • Basal • promoter • +1 • Termination • signal • ORF • mRNA • CDS
Chromosomes F. vesca: 35,000 genes/7 chromosomes = 5,000 genes/chromosome?
Genes, chromosomes, and genomes F. vesca: 2n = 2x = 14 genome = 240 Mb average gene ~ 3kb 79,333 genes? 11,333 genes/chromosome? 35,000 genes….. ~ 5,000 genes/chromosome What’s the rest of the genome???????
C-value paradox “Organisms of similar evolutionary complexity differ vastly in DNA content” Federoff, N. 2012. Science. 338:758-767. 1 pg = 978 Mb
The C-value paradox Fedoroff Science 2012;338:758-767
C-value paradox Junk??????? Shining a Light on the Genome’s ‘Dark Matter’
Gene regulation • Pennisi, E. 2010. Science 330:1614. 40% of all human disease-related SNPs are OUTSIDE of genes • The dark matter is conserved and therefore must have a function • DNA sequences in the dark matter are involved in gene regulation • ~80% of the genome is transcribed but “genes” account for ~2% • RNAs of all shapes and sizes: • RNAi • lncRNA