260 likes | 404 Views
Introduction to genomes. Content the human genome CNVs SNPs Alternative splicing genome projects Celia van Gelder CMBI UMC Radboud June 2009 c.vangelder@cmbi.ru.nl. The human genome. Genome: the entire sequence of DNA in a cell 3 billion basepairs (3Gb)
E N D
Introduction to genomes Content the human genome CNVs SNPs Alternative splicing genome projects Celia van Gelder CMBI UMC Radboud June 2009 c.vangelder@cmbi.ru.nl
The human genome • Genome: the entire sequence of DNA in a cell • 3 billion basepairs (3Gb) • 22 chromosome pairs + X en Y chromosomes • Chromosome length varies from ~50Mb to ~250Mb • About 22000 protein-coding genes • Human genome is 99.9% identical among individuals
Eukaryotic Genomes: more than collections of genes • Protein coding genes • RNA genes (rRNA, snRNA, snoRNA, miRNA, tRNA) • Structural DNA (centromeres, telomeres) • Regulation-related sequences (promoters, enhancers, silencers, insulators) • Parasite sequences (transposons) • Pseudogenes (non-functional gene-like sequences) • Simple sequence repeats
Annotating the genome • Genome annotation is the process of attaching biological information to sequences. It consists of two main steps: • identifying elements on the genome, a process called Gene Finding, and • attaching biological information to these elements. • Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
The human genome cntnd From: Molecular Biology of the Cell (4th edition) (Alberts et al., 2002) • Only 1.2% codes for proteins, 3.5-5% is under selection • Long introns, short exons • Large spaces between genes • More than half consists of repetitive DNA
Eukaryotic Genomes: High fraction non-coding DNA Blue: Prokaryotes Black: Unicellular eukaryotes Other colors: Multicellular eukaryotes (red = vertebrates) From: Mattick, NRG, 2004
Variation along genome sequence • Nucleotide usage varies along chromosomes • Protein coding regions tend to have high GC levels • Genes are not equally distributed across the chromosomes • Housekeeping generally in gene-dense areas • Gene-poor areas tend to have many tissue specific genes From: Ensembl
Chromosome organisation (1) From: Lodish (4th edition)
Chromosome organisation (2) From: Lodish (4th edition) • DNA packed in chromatin • Non-active genes often in densely packed chromatin (30-nm fiber) • Active genes in less dense chromatin (beads-on-a-string) • Gene regulation by changing chromatin density, methylation/acetylation of the histones Genesthat are OFF Genesthat are ON
Today’s focus • Copynumbervariations (CNV) • Single NucleotidePolymorphisms (SNPs) • Alternativetranscripts
Copy Number Variation • People do not only vary at the nucleotide level (SNPs) • Copy Number Variations (CNVs):duplications and deletions of pieces of chromosome • When there are genes in the CNV areas, this can lead to variations in the number of gene copies between individuals • CNVs may either be inherited or caused by de novo mutation
Why study CNVs? • CNVs are common in cancer and other diseases. • CNVs are also common in normal individuals and contribute to our uniqueness. These changes can also influence the susceptibility to disease. • Since CNVs often encompass genes, they can have important roles both in characterizing human disease and discovering drug response targets. • Understanding the mechanisms of CNV formation may also help us better understand human genome evolution.
CNV & disease, examples CNVs have been implicated in • CancerEGFR highercopynumber in non-smallcelllungcancer • Low copynumber of FCGR3B canincreasesusceptibility to SLE & otherautoimmune disorders • Autism • Schizophrenia (dept. humangenetics) • Mental retardation (dept. humangenetics)
T T A A A T A T C G C G G G A A T T G C G C A T T A T A C G C G T T A A G G C C T A T A Single Nucleotide Polymorphisms (SNPs) • SNPs are DNA sequence variations that occur when a single nucleotide (A,T,C,or G) in the genome sequence is altered. • Similar to mutations, but are simultaneously present in the population, and generally have little effect • Are being used as genetic markers (a genetic disease is e.g. associated with a SNP) Single Nucleotide- Polymorphism (SNP)
SNP fact sheet • For a variation to be considered a SNP, it must occur in at least 1% of the population. • SNPs, which make up about 90% of all human genetic variation, occur every 100 to 300 bases along the 3-billion-base human genome. • Two of every three SNPs involve the replacement of cytosine (C) with thymine (T). • SNPs can occur in coding (gene) and non coding regions of the genome.
SNPs & medicine • Although more than 99% of human DNA sequences are the same, variations in DNA sequence can have a major impact on how humans respond to: • disease; • environmental factors such as bacteria, viruses, toxins, and chemicals; • and drugs and other therapies. • This makes SNPs valuable for biomedical research and for developing pharmaceutical products or medical diagnostics. • SNPs are also evolutionarily stable—not changing much from generation to generation—making them easier to follow in population studies.
SNP & disease, example Alzheimer's disease & apolipoprotein E • ApoE contains two SNPs that result in three possible alleles for this gene: E2, E3, and E4. • Each allele differs by one DNA base, and the protein product of each gene differs by one amino acid. • Each individual inherits one maternal copy of ApoE and one paternal copy of ApoE. • Research has shown that a person who inherits at least one E4 allele will have a greater chance of developing Alzheimer's disease.
HapMap • The HapMap Project is a multi-country effort to identify and catalog genetic similarities and differences in human beings. • Using HapMap, researchers will be able to find genes that affect health, disease, and individual responses to medications and environmental factors. • HapMap is a collaboration among scientists and funding agencies from Japan, the United Kingdom, Canada, China, Nigeria, and the United States • All of the information generated will be released into the public domain. • www.hapmap.org
Alternative splicing (2) ~ 15 % of the mutations that cause genetic diseases affect pre-mRNA splicing ~ 15 % of the mutations that cause genetic diseases affect pre-mRNA splicing
Genome projects, a bit of history http://www.genomesonline.org/
Sequenced genomes • 1995 Haemophilusinfluenzae 1.8 Mb • 1996 Yeast 12 Mb • 1998 C. elegans 100 Mb • 1999 Fruit fly 125 Mb • 2000 Arabidopsis 115 Mb • 2001 Human (draft) • 2002 Mouse 2.6 Gb • 2002 Rice • 2004 Human (“finished”) 3 Gb • 2006 Sea urchin • 2007 Grapevine • 2008 Platypus (draft) • 2009 Cow
Some genome sizes Organism Genome size (base pairs) Virus, Phage Φ-X174; 5387 First sequenced genome Virus, Phage λ 5×104 Bacterium, Escherichia coli 4×106 Plant, Fritillaryassyrica 13×1010 Largest known genome Fungus,Saccharomycescerevisiae 2×107 Nematode, Caenorhabditiselegans 8×107 Insect, Drosophila melanogaster 2×108 Mammal, Homo sapiens 3×109
Genome browsers can be used to examine …. • Genomic sequence conservation • Duplications en deletions of pieces chromosome (Copy Number Variations, CNVs) • Single Nucleotide Polymorphisms (SNPs) • Alternativesplicing • And much more…. LET’S GO BROWSE GENOMES!
Alternative Transcripts Source: Wikipedia (http://www.wikipedia.org/)