620 likes | 694 Views
Genome Browsers. UCSC (Santa Cruz, California) and Ensembl (EBI, UK). http://genome.ucsc.edu/. http://www.ensembl.org/. Eukaryotic Genomes: Not only collections of genes. Protein coding genes RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA) Structural DNA (centromeres, telomeres)
E N D
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK) http://genome.ucsc.edu/ http://www.ensembl.org/
Eukaryotic Genomes: Not only collections of genes • Protein coding genes • RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA) • Structural DNA (centromeres, telomeres) • Regulation-related sequences (promoters, enhancers, silencers, insulators) • Parasite sequences (transposons) • Pseudogenes (non-functional gene-like sequences) • Simple sequence repeats
Eukaryotic Genomes: High fraction non-coding DNA • Blue: Prokaryotes • Black: Unicellular eukaryotes • Other colors: Multicellular eukaryotes (red = vertebrates) Bron: Mattick, NRG, 2004
Human Genome • 3 billion basepairs (3Gb) • 22 chromosome pairs + X en Y chromosomes • Chromosome length varies from ~50Mb to ~250Mb • About 22000 protein-coding genes • compare with ~14000 for fruitfly en ~19000 for Nematode C. elegans
Human genome Bron: Molecular Biology of the Cell (4th edition) (Alberts et al., 2002) • Only 1.2% codes for proteins, 3.5-5% is under selection • Long introns, short exons • Large spaces between genes • More than half exists of repetitive DNA
Variation Along Genome sequence • Nucleotide usage varies along chromosomes • Protein coding regions tend to have high GC levels • Genes are not equally distributed across the chromosomes • Housekeeping generally in gene-dense areas • Gene-poor areas tend to have many tissue specific genes Bron: Ensembl
Chromosome organisation Bron: Lodish (4th edition) • DNA packed in chromatin • Active genes in less dense chromatin (beads-on-a-string) • Non-active genes often in densely packed chromatine (30-nm fiber) • Gene regulation by changing chromatin density, methylation/acetylation of the histones • Limited availability of chromatin information in genome browsers (post transcriptional modifications are currently under investigation with ChIP-on-chip experiments
Genome browsers UCSC NCBI Ensembl http://genome.ucsc.edu/ http://www.ensembl.org/
With the UCSC Genome Browser Genome Browsing http://genome.ucsc.edu/
Gene record (4) “best hit”
Genomic elements • Genome browsers can be used to examine other things • Genomic sequence conservation • Pseudogenes • Duplications en deletions of pieces chromosome (Copy Number Variations, CNVs)
Genomic Sequence Conservation • Not only protein coding parts are conserved in evolution • Conserved non-coding genomic sequences can be involved in gene regulation (enhancers, silencers, insulators) • With the UCSC browser one can examine genomic conservation
Pseudogenes • Pseudogenes “look” like (are homologous to) protein-coding genes, but are non-functional • Two types: • Unprocessed pseudogenes (loss of function) • Processed pseudogenes (mRNAs that are retrotranscribed onto the genome they miss introns and sometimes have a polyA) • The UCSC contains various databases of pseudogenes: • Yale pseudogenes (both types pseudogenes) • Vega pseudogenes (both types pseudogenes) • Retroposed genes (only processed pseudogenes)
Copy Number Variation • People do not only vary at the nucleotide level (SNPs); short pieces genome can be present in varying number of copies (Copy Number Polymorphisms (CNPs) or Copy Number Variants (CNVs) • When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals • With the UCSC browser CNVs can be examined
Genome browsers UCSC Ensembl http://genome.ucsc.edu/ http://www.ensembl.org/
With the Ensembl Genome browser Genome Browsing http://www.ensembl.org/
Alternative Transcripts Bron: Wikipedia (http://www.wikipedia.org/)
Single Nucleotide Polymorphisms (SNPs) • Sequence variations within a species • Similar to mutations, but are simultaneously present in the population, and generaly have little effect • Are being used as genetic markers (a genetic disease is e.g. associated with a SNP) • ENSEMBL offers a nice SNP view