100 likes | 350 Views
Methods in genome sequencing and SNP finding. Gabor Marth BI 820 presented by Tony Faber. Sequences used in SNP analysis and genomic sequencing . Expressed Sequence Tags (ESTs) Sequence-tagged site (STS) sequences Reduced Representation Libraries (RRL) Whole-genome shotgun libraries
E N D
Methods in genome sequencing and SNP finding Gabor Marth BI 820 presented by Tony Faber
Sequences used in SNP analysis and genomic sequencing • Expressed Sequence Tags (ESTs) • Sequence-tagged site (STS) sequences • Reduced Representation Libraries (RRL) • Whole-genome shotgun libraries • Genome Survey Sequence (GSS)
Expressed Sequence Tags (ESTs) • Relatively short (200-400bp) of partial cDNA sequences • Many are single-pass reads from tissue specific cDNA libraries • HGP aligned to human reference sequence EST quality (SEQREF) Coding and UTRs make up ESTs (can be multiple exons)
Identifying putative full-ORF cDNA clones 5’ ESTs Matches Refseq No Yes 5’ end aligns at start Protein Comp HKScan GenomeScan Comparision Read page 2 Matches 5” end of predicted gene no Matches Amino-terminus Yes Select for complete sequencing
Sequenced Tagged Sites (STSs) • First used- advantages include PCR primers readily available, recovered BACs/YACs during HGP PCR much cheaper than BAC/YAC sequencing • Represent the superposition (i.e. can also be double-pass reads) • Fingerprint clone contigs bound to specific STSs
Whole-genome shotgun • Random clones from the genomes of many individuals • Requires several-fold coverage of the genome (e.g. sequencing, SNP discovery)
Genome Survey Sequence (GSS) • To survey a new genome, or get a general idea of genomic make-up of organism • Similar to ESTs, except the DNA is genomic in origin (not mRNA) • Also single pass reads • From cosmid/BAC/YAC ends, exon trapped genomic sequences, and Alu PCR sequences • Splicing events
Reduced Representation sequences (RRS) • Heavy cloning in certain regions • Contain STSs, many corresponding to genes or ESTs • One clone per MB on every chromosome, excellent coverage • Reproducibly prepared subsets of the genome from several individuals, each containing a manageable number of loci • Thus allowing Re-sampling • Greater flexibility and efficiency • Problems- creating reduced representations, finding ortholouges matches, accuracy • Origin of replication • Binding to particular protein • Restriction fragments in a certain range (size selected restriction fragments)
SNP context • Most popular method for obtaining SNP’s • EST alignment • Major sources of genomic SNPs include sequences for restricted genome representation libraries, random shotgun reads aligned to genome sequence, BAC/YAC overlaps