440 likes | 635 Views
Bioinformatiatics. Spring 2007. Ch. 6 - Genomics. Completed genomes. Bioinformatiatics. Spring 2009. http://www.genomesonline.org. Bioinformatiatics. Spring 2009. Avg. genome = 5 mb Typical sequence coverage = 20X, therefore approx. 100 mb of DNA Avg. English word size = 5 letters
E N D
Bioinformatiatics Spring 2007 Ch. 6 - Genomics
Completed genomes Bioinformatiatics Spring 2009 • http://www.genomesonline.org
Bioinformatiatics Spring 2009 • Avg. genome = 5 mb • Typical sequence coverage = 20X, therefore approx. 100 mb of DNA • Avg. English word size = 5 letters • Avg. words per page = 250, therefore 1250 letters per page • Avg. book size = 200 pages, therefore 250,000 letters per book • Approximately 400 books per genome • 958 completed genomes as of January 1, 2009 • Approximately 383,200 books worth of genomic information • MSU library holdings: 182,000
Approaches to Genome Sequencing Bioinformatiatics Spring 2007 • Whole Genome Sequencing • Shotgun Sequencing • Expressed Sequence Tags • Comparative Genomics • Metagenomics
Overview of Genome Sequencing Isolate Genomic DNA Genomic DNA Create Genomic Library BAC Clones Construction of Genome Map DNA Sequencing and Assembly
ala, Qiagen’s DNeasy kit Isolating Genomic DNA • Lysis: • Proteinase K digestion • Lysis by chaotropic salt • Purification: • DNA negatively charged • Bind positively charged column • Wash (EtOH) away impurities • Elution: • Removal of DNA • Disrupt ionic interaction with high salt buffer • Preservation: • Store at -20°C to -160°C • Tris•EDTA buffer [pH 8.0]
Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation
Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation
Bacterial Artificial Chromosome • Derived from F plasmids • Multiple cloning site • Selectable Marker • Antibiotic Resistance Gene - ie, cm • Ori S - unidirectional • Par genes • partitioning genes • maintain single copy of BAC
Creating a Genomic Library • Cut Genomic DNA: • Partial Restriction Digest • EcoRI & EcoRI methylase • Mechanical Shearing • Determine Avg. fragment size • Clone Fragments into BAC vectors: • Proporties of BACs BAC Clones • Transform E. coli: • Electroporation
Construction of Genome Map Transformed E. coli: Plasmid Miniprep BAC Clones Construction of Genome Map • BAC end sequencing • Identify overlapping BACs • Subclone BACs into plasmids DNA Sequencing and Assembly
Overview of Shotgun Sequencing Isolate Genomic DNA Genomic DNA Create Genomic Library Plasmid Clones DNA Sequencing and Assembly Construction of Genome Map
Overview of EST Sequencing Isolate mRNA Create cDNA Create Genomic Library DNA Sequencing
Comparative Genomics Isolate mRNA and create cDNA Create Genomic Library BAC Clones Construction of Genome Map DNA Sequencing and Assembly Synteny - same gene order preserved between species
Metagenomic analysis • What is metagenomics? • Metagenomics is the genomic analysis of the collective genomes of an assemblage of organisms from a defined environment. • Handelsman, et al, 2002 • a.k.a., community genomics, environmental genomics • Derived from tools, techniques and models used in genomics. • Why do metagenomic analysis? • Genomic content of all eucaryotes, bacteria, archaea and viruses in an evironment. • Provides a picture of genetic/functional potential of the community.
Preliminary Categorization of 263 ORFs from a Fosmid Library of Subgingival Plaque
Bioinformatiatics Spring 2007 Genome Annotation
Genome Assembly and Annotation RefSeq db
Caveats • Finding genes involves computational methods as well as experimental validation • Computational methods are often inadequate, and often generate erroneous ‘gene’ (false positive) sequences which: • Are missing exons • Have incorrect exons • Over predict genes • Where the 5’ and 3’ UTR are missing
Things we are looking to annotate? • CDS • mRNA • Alternative RNA • Promoter and Poly-A Signal • Pseudogenes • ncRNA • Repeat elements • G+C content
Pseudogenes • Could be as high as 20-30% of all Genomic sequence predictions could be pseudogene • Non-functional copy of a gene • Processed pseudogene • Retro-transposon derived • No 5’ promoters • No introns • Often includes poly-A tail • Non-processed pseudogene • Gene duplication derived • Both include events that make the gene non-functional • Frameshift • Stop codons • We assume pseudogenes have no function, but we really don’t know!
Noncoding RNA (ncRNA) • tRNA – transfer RNA: involved in translation • rRNA – ribosomal RNA: structural component of ribosome, where translation takes place • snRNA – small nuclear RNA: functional/catalytic in RNA maturation • Antisense RNA - gene regulation • siRNA - gene silencing
Noncoding RNA (ncRNA) • ncRNA represent 80-98% of all transcripts in cell • ncRNA have not been taken into account in gene counts • cDNA • ORF computational prediction • Comparative genomics looking at ORF • ncRNA can be: • Structural • Catalytic • Regulatory
GenBank Features -10_signal -35_signal 3'clip 3'UTR 5'clip 5'UTR attenuator CAAT_signal CDS conflict C_region D-loop D_segment enhancer exon GC_signal gene iDNA intron J_segment LTR mat_peptide misc_binding misc_difference misc_feature misc_recomb misc_RNA misc_signal misc_structure modified_base mRNA N_region old_sequence polyA_signal polyA_site precursor_RNA primer_bind prim_transcript promoter protein_bind RBS repeat_region repeat_unit rep_origin rRNA satellite scRNA sig_peptide snoRNA snRNA S_region stem_loop STS TATA_signal terminator transit_peptide tRNA unsure variation V_region V_segment
LOCUS NG_005487 1850 bp DNA linear ROD 14-FEB-2006 DEFINITION Mus musculus ubiquitin-conjugating enzyme E2 variant 2 pseudogene (LOC625221) on chromosome 6. ACCESSION NG_005487 VERSION NG_005487.1 GI:87239965 KEYWORDS . SOURCE Mus musculus (house mouse) ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 1850) AUTHORS Wilson,R. TITLE Mus musculus BAC clone RP24-201D17 from 6 JOURNAL Unpublished (2003) COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from AC121925.2. FEATURES Location/Qualifiers source 1..1850 /organism="Mus musculus" /mol_type="genomic DNA" /db_xref="taxon:10090" /chromosome="6" /note="AC121925.2 32277..34126" gene 101..1750 /gene="LOC625221" /pseudo /db_xref="GeneID:625221" repeat_region 1792..1827 /rpt_family="ID" ORIGIN 1 tcttctgcct caattcctca agtgctagta tcatatgccc atgccattat ttttaactcc 61 cctttttcat gctaagaatt gaacacacgg ccctgcgtgc ggtggtgcgt ctggtagcag 121 gagaagatgg cggtctccac aggagttaaa gttcctcgta attttcgctt gttggaagaa
The ideal annotation of “MyGene” All clones All SNPs Promoter(s) MyGene All mRNAs All proteins • All protein modifications • Ontologies • Interactions (complexes, pathways, networks) • Expression (where and when, and how much) • Evolutionary relationships All structures