1 / 35

Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV

Next Generation sequencing and Gene Annotation. Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV. DNA SEQUENCING.

treva
Download Presentation

Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next Generation sequencing and Gene Annotation Ms. Shivani Bhagwat Lecturer, School of Biotechnology DAVV

  2. DNA SEQUENCING DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA. The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Maxam–Gilbert sequencing DNA sequencing method based on chemical modification of DNA and subsequent cleavage at specific bases.

  3. The method requires radioactive labeling at one 5' end of the DNA (typically by a kinase reaction using gamma-32P ATP) and purification of the DNA fragment to be sequenced. • Chemical treatment generates breaks at a small proportion of one or two of the four nucleotide bases in each of four reactions (G, A+G, C, C+T). • the purines (A+G) are depurinated using formic acid, the guanines (and to some extent the adenines) are methylated by dimethyl sulfate, and the pyrimidines (C+T) are methylated using hydrazine. • The addition of salt (sodium chloride) to the hydrazine reaction inhibits the methylation of thymine for the C-only reaction. • The modified DNAs are then cleaved by hot piperidine at the position of the modified base. • Thus a series of labeled fragments is generated, from the radiolabeled end to the first "cut" site in each molecule. • The fragments in the four reactions are electrophoresed side by side in denaturing acrylamide gels for size separation. • To visualize the fragments, the gel is exposed to X-ray film for autoradiography, yielding a series of dark bands each corresponding to a radiolabeled DNA fragment, from which the sequence may be inferred. • NOTE: Maxam-Gilbert sequencing has fallen out of favour due to its technical complexity prohibiting its use in standard molecular biology kits, extensive use of hazardous chemicals, and difficulties with scale-up.

  4. Chain-termination methods The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs) as DNA chain terminators. The classical chain-termination method requires a single-stranded DNA template, a DNA primer(labelled ), a DNA polymerase, normal deoxynucleotidetriphosphates (dNTPs), and modified nucleotides (dideoxyNTPs) that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP) which are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides, thus terminating DNA strand extension and resulting in DNA fragments of varying length.

  5. The newly synthesized and labelled DNA fragments are heat denatured. Separated by size (with a resolution of just one nucleotide) by gel electrophoresis on a denaturing polyacrylamide-urea gel with each of the four reactions run in one of four individual lanes (lanes A, T, G, C). The DNA bands are then visualized by autoradiography or UV light, and the DNA sequence can be directly read off the X-ray film or gel image. NOTE:Limitations include non-specific binding of the primer to the DNA, affecting accurate read-out of the DNA sequence, and DNA secondary structures affecting the fidelity of the sequence.

  6. Dye-terminator sequencing Dye-terminator sequencing utilizes labelling of the chain terminator ddNTPs, which permits sequencing in a single reaction, rather than four reactions as in the labelled-primer method. In dye-terminator sequencing, each of the four dideoxynucleotide chain terminators is labelled with fluorescent dyes, each of which emit light at different wavelengths.

  7. Automated DNA-sequencing instruments (DNA sequencers) can sequence up to 384 DNA samples in a single batch (run) in up to 24 runs a day. DNA sequencers carry out capillary electrophoresis for size separation, detection and recording of dye fluorescence, and data output as fluorescent peak trace chromatograms.

  8. Base calling software typically gives an estimate of quality to aid in quality trimming.

  9. Massively parallel signature sequencing(MPSS) Was in 1990s and a bit complicated. It is a sequence based approach that can be used to identify and quantify mRNA transcripts present in a sample similar to serial analysis of gene expression (SAGE) but the biochemical manipulation and sequencing approach differ substantially. mRNA transcripts to be identified through the generation of a 17-20 bp (base pair) signature sequence adjacent to the 3’-end. Each signature sequence is cloned onto one of a million microbeads. The technique ensures that only one type of DNA sequence is on a microbead. The microbeads are then arrayed in a flow cell for sequencing and quantification. fluorescently labeled encoders would be used to decode the sequence.

  10. Pyrosequencing Technology Developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics. Based on emulsion PCR technology and detection of pyrophosphate release on nucleotide incorporation. ssDNA template is hybridized to a sequencing primer and incubated with the enzymes DNA polymerase, ATP sulfurylase, luciferase and apyrase, and with the substrates adenosine 5´ phosphosulfate (APS) and luciferin. The addition of one of the four deoxynucleotide triphosphates (dNTPs) initiates the second step. DNA polymerase incorporates the correct, complementary dNTPs onto the template. This incorporation releases pyrophosphate (PPi). ATP sulfurylase quantitatively converts PPi to ATP in the presence of adenosine 5´ phosphosulfate. This ATP acts as fuel to the luciferase-mediated conversion of luciferin to oxyluciferin that generates visible light in amounts that are proportional to the amount of ATP. Unincorporated nucleotides and ATP are degraded by the apyrase, and the reaction can restart with another nucleotide.

  11. Emulsion PCR (ePCR) PCR amplification

  12. Sequential nucleotide addition

  13. Light reaction

  14. Sequencing by Synthesis technology(SBS) • Developed by Solexa and sequencing technology based on reversible dye-terminators and bridge PCR. • The combination of short inserts and longer reads increase the ability to fully characterize any genome. • DNA molecules are first attached to primers on a slide and amplified so that local clonal colonies are formed (bridge amplification). Four types of reversible terminator bases (RT-bases) are added, and non-incorporated nucleotides are washed away. Unlike pyrosequencing, the DNA can only be extended one nucleotide at a time. A camera takes images of the fluorescently labelled nucleotides, then the dye along with the terminal 3' blocker is chemically removed from the DNA, allowing the next cycle. • Reversible dye terminators: 3’-end has a protection group that can be reverted to a hydroxyl group once it has been incorporated in the growing DNA chain.

  15. Sequencing by ligation technology Developed by Applied Biosystems SOLiD . Sequencing by ligation relies upon the sensitivity of DNA ligase for base-pairing mismatches. The target molecule to be sequenced is a single strand of unknown DNA sequence, flanked on at least one end by a known sequence. A short "anchor" strand is brought in to bind the known sequence. A mixed pool of probe oligonucleotides is then brought in (8 or 9 bases long), labeled (typically with fluorescent dyes) according to the position that will be sequenced. These molecules hybridize to the target DNA sequence, next to the anchor sequence, and DNA ligase preferentially joins the molecule to the anchor when its bases match the unknown DNA sequence. Based on the fluorescence produced by the molecule, one can infer the identity of the nucleotide at this position in the unknown sequence.

  16. VisiGen Biotechnologies approach VisiGen Biotechnologies introduced a specially engineered DNA polymerase for use in their sequencing. This polymerase acts as a sensor - having incorporated a donor fluorescent dye by its active centre. This donor dye acts by FRET (fluorescent resonant energy transfer), inducing fluorescence of differently labeled nucleotides. This approach allows reads performed at the speed at which polymerase incorporates nucleotides into the sequence (several hundred per second). The nucleotide fluorochrome is released after the incorporation into the DNA strand. The expected read lengths in this approach should reach 1000 nucleotides, however this will have to be confirmed.

  17. Nanopore sequencing technology Developed by Helicose Biosciences. This method is based on the readout of electrical signal occurring at nucleotides passing by alpha-hemolysin pores covalently bound with cyclodextrin. The DNA passing through the nanopore changes its ion current. This change is dependent on the shape, size and length of the DNA sequence. Each type of the nucleotide blocks the ion flow through the pore for a different period of time. The method has a potential of development as it does not require modified nucleotides, however single nucleotide resolution is not yet available.

  18. Emulsion PCR The single-stranded DNA fragments or templates are attached to the surface of beads using adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The DNA library is generated through random fragmentation of the genomic DNA. The surface of the beads contains oligonucleotide probes with sequences that are complementary to the adaptors binding the DNA fragments. After that, the beads will be compartmentalized into separate water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead will serve as a PCR microreactor for amplification steps to take place and produce clonally amplified copies of the DNA fragment.

  19. Bridge amplification on solid surface High-density forward and reverse primers are covalently attached to the slide in a flow cell. The ratio of the primers to the template on the support defines the surface density of the amplified clusters. The flowcell is exposed to reagents for polymerase-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface. Repeated denaturation and extension results in localized amplification of DNA fragments in millions of unique locations across the flow cell surface. Solid-phase amplification can produce 100–200 million spatially separated template clusters (Illumina/Solexa), providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction.

  20. Single-molecule templates Some of the clonally amplified methods protocols are cumbersome to implement and require a large amount of genomic DNA material (3–20 μg). The preparation of single-molecule templates is more straightforward and requires less starting material (<1 μg). More importantly, these methods do not require PCR, which creates mutations in clonally amplified templates that masquerade as sequence variants. AT-rich and GC-rich target sequences may also show amplification bias in product yield, which results in their under representation in genome alignments and assemblies. Single molecule templates are usually immobilized on solid supports using one of at least 3 different approaches: 1. Spatially distributed individual primer molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adaptors to the fragment ends, is then hybridized to the immobilized primer

  21. 2. Spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers. A common primer is then hybridized to the template. In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction. Both of the above approaches are used by Helicos BioSciences. 3. Spatially distributed single polymerase molecules are attached to the solid support, to which a primed template molecule is bound. Larger DNA molecules (up to 10,000 bp) can be used with this technique . This approach is used by Pacific Biosciences.

  22. GENE ANNOTATION

  23. What is Annotation??? Extraction, definition, and interpretation of features on the genome sequence derived by integrating computational tools and biological knowledge. DNA Analysis -- Find the genes – Heuristic signals – Inherent features – Intelligent methods Characterize each gene – Compare with other genes – Find functional components – Predict features

  24. Heuristic Signals DNA contains various recognition sites for internal machinery like: • Promoter signals • Transcription start signals • Start Codon • Exon, Intron boundaries • Transcription termination signals Inherent Features DNA exhibits certain biases that can be exploited to locate coding regions • Uneven distribution of bases • Codon bias • CpG islands • Encoded amino acid sequence • Imperfect periodicity • Other global patterns

  25. Intelligent Methods Pattern recognition methods weigh inputs and predict gene location – Content-based methods – Site-based methods – Comparative methods • Neural Networks • Hidden Markov Models neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network.

  26. Looks at several structural features – Splice donor/acceptor sites – Putative coding regions – Intronic regions – Linear discriminant analysis to split exon / non-exon classes – Dynamic programming to assemble best gene structure

  27. Quadratic discriminant analysis – Exon length – Exon-intron transitions – Splice sites – Branch sites – Exon, strand, frame scores – Detects internal exons Strategies • Select by correlation coefficient • Select by review paper • Select by recommendation • Use them all

  28. Internet Resources Banbury Cross http://igs-server.cnrs-mrs.fr/igs/banbury FGENEH http://genomic.sanger.ac.uk/gf/gf.shtml GeneID http://www1.imim.es/geneid.html GeneMachine http://genome.nhgri.nih.gov/genemachine GENSCAN http://genes.mit.edu/GENSCAN.html Genotator http://www.fruitfly.org/_nomi/genotator/ GRAIL http://compbio.ornl.gov/tools/index.shtml GRAIL-EXP http://compbio.ornl.gov/grailexp MZEF http://www.cshl.org/genefinder PROCRUSTES http://www-hto.usc.edu/software/procrustes RepeatMasker http://ftp.genome.washington.edu/RM/RepeatMasker.html HMMgene http://www.cbs.dtu.dk/services/HMMgene http://www.wiley.com/legacy/products/subject/life/bioinformatics/chapterlinks.html

  29. Characterize a Gene Collect clues for potential function • Comparison with other known genes, proteins • Predict secondary structure • Fold classification • Gene Expression • Gene Regulatory Networks • Phylogenetic comparisons • Metabolic pathways

More Related