1 / 62

The Human Genome What’s in it? How do we know?

The Human Genome What’s in it? How do we know?. Gary Benson Department of Computer Science Department of Biology Program in Bioinformatics Boston University. Outline of Talk. Protein Genes SNPs Haplotypes Finding a Disease Locus . Size of the Genomes. bacteria. yeast.

jerod
Download Presentation

The Human Genome What’s in it? How do we know?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Human Genome What’s in it? How do we know? Gary Benson Department of Computer Science Department of Biology Program in Bioinformatics Boston University

  2. Outline of Talk • Protein Genes • SNPs • Haplotypes • Finding a Disease Locus

  3. Size of the Genomes bacteria yeast round worm fruit fly flowering plant

  4. The Human Genome

  5. What the letters stand for • DNA has four chemical subunits, called nucleotide bases abbreviated A, C, G, T. • GATTACA http://en.wikipedia.org/wiki/Nucleotide

  6. What’s in the Genome? • Chromosomes – 23 pairs • Genes • Protein genes • RNA genes • MicroRNA genes • Repeats • Tandem repeats • Inverted repeats • Transposons • Segmental duplications • Regulatory regions • Promoters • Transcription factor binding sites

  7. Protein Genes • A protein gene contains the genetic code for a protein. The production of protein involves transcription (copying DNA to RNA) and translation (using RNA code to produce a protein). http://www.slic2.wsu.edu:82/hurlbert/micro101/images/TransTranscrip.gif

  8. Transcription Translation http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.html http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpg

  9. Finding Protein Genes • Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are predicted computationally using a gene model.

  10. Finding Protein Genes • Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are predicted computationally using a gene model.

  11. Finding Protein Genes • Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are predicted computationally using a gene model.

  12. Finding Protein Genes • Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are predicted computationally using a gene model.

  13. Finding Protein Genes • Before the sequencing of genomes, protein genes were found experimentally. Now, new genes are predicted computationally using a gene model.

  14. Building a Gene Model • Gene models for prediction are based on the structure of genes in DNA and their messenger RNAs (mRNAs). This includes exons, introns, promoters, and the polyadenylation signal. http://xray.bmc.uu.se/Courses/Bke2/Exercises/Exercise_answers/pre_mRNA_processing.gif

  15. Exons • In this example, EXONS are uppercase and introns are lowercase. Exons contain the code for a protein, introns interrupt the exons. Before translation, introns are removed from the messenger RNA. • DNA: • …ACTGCTACAGtctattgaGAACAACATAGtcacgaacttaacgtgcaGTTTAACAGCACGtctcgaagggca… • RNA (before removal of introns): • …ACUGCUACAGucuauugaGAACAACAUAGucacgaacuuaacgugcaGUUUAACAGCACGucucgaagggca… • RNA (after removal of introns): • …ACUGCUACAGGAACAACAUAGGUUUAACAGCACG…

  16. Finding Exons • The sequence of an exon contains codons. Each codon is a triplet of nucleotides which codes for a single amino acid. Amino acids are the building blocks of a protein. http://en.wikipedia.org/wiki/Genetic_code

  17. Genetic Code • . Each codon specifies one of twenty amino acids. Three codons are stop codons, which specify the end of translation. http://www.emc.maricopa.edu/faculty/farabee/BIOBK/code.gif

  18. Open Reading Frame (ORF) • An open reading frame(ORF), is a sequence of codons that does not contain a stop codon. alanine threonine glutamic acid leucine arginine serine STOP! http://en.wikipedia.org/wiki/Genetic_code

  19. Finding Exons • Sequence: • acggacucuagccuaaugugacgacugacauagguaaauucgcuc • Even though this sequence contains stop codons, they are not present in all reading frames. • frame +1 • acg gac ucu agc cua aug uga cga cug aca uag gua aau ucg cuc • frame +2 • a cgg acu cua gcc uaa ugu gac gac uga cau agg uaa auu cgc uc • frame +3 • ac gga cuc uag ccu aau gug acg acu gac aua ggu aaa uuc gcu c • Very short ORFs are unlikely.

  20. Finding Introns • Introns usually start at a G – T boundary and end at an A – G boundary.

  21. Finding Exons • Sequence: • acggacucuagccuaaugugacgacugacauagguaaauucgcuc • A gene can contain open reading frames connected across stop codons by an intron • frame +1 • acg gac ucu agc cua aug uga cga cug aca uag gua aau ucg cuc • frame +3 • ac gga cuc uag ccu aau gug acg acu gac aua ggu aaa uuc gcu c

  22. How many genes are there? • Estimates • pre 2000: 100,000 based on estimates of required number of genes to account for human complexity • 2001: 30,000 – 40,000 based on first draft of human genome • 2003: 23,000 – 24,500 based on gene prediction computer programs • Why so low? • alternate splicing of exons • complex regulatory mechanisms • inability to predict genes which are unlike those seen before http://www.ornl.gov/sci/techresources/Human_Genome/faq/genenumber.shtml

  23. RNA Genes • RNA genes do not code for proteins. Instead, the RNA molecule itself is functional in the cell. • Examples include: • Ribosomal RNA – these molecules form the major component of the protein building machinery • Transfer RNA – work with ribosomal RNA to insert correct amino acids into growing proteins • MicroRNA – a newly discovered class of RNA which helps regulate gene expression.

  24. Ribosome http://www.ncbi.nlm.nih.gov/Class/NAWBIS/Modules/RNA/images/fig_rna12.jpg

  25. Transcription Translation http://nobelprize.org/medicine/educational/dna/a/translation/polysome_em.html http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/M/Miller_Beatty3.jpg

  26. RNA Genes • MicroRNAs are short and show little or no conservation of sequence. • Unlike protein genes, RNA genes do not contain codons or open reading frames. But, they do contain inverted repeats.

  27. Inverted Repeats (IRs) • RNA • G A C U U G A U C A A G U C reversed complemented Two patterns, one the reverse complement of the other

  28. IR Nomenclature • RNA • G A C U U G A U C A A G U C Right arm Left arm Spacer

  29. Spacer C A G U U C A G G U C A A G U C Right arm Left arm Stem-Loop Structure Structure forms by pairing of complementary bases

  30. MicroRNA • MicroRNAs come from a precursor that contains a stem-loop. http://www.ma.uni-heidelberg.de/apps/zmf/argonaute/interface/mirna.jpeg

  31. Detection of Approximate Inverted Repeats Human Chr. 3 ~173,291,101 • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGT

  32. Detection of Approximate Inverted Repeats Human Chr. 3 ~173,291,101 • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGT Arms are 72 nt long, spacer is 42bp long

  33. The Problem: Find the Inverted Repeat Human Chr. 3 ~173,291,101 • AAGACTTGAA CAACTTTTAA ACATAAGATC AATTATTTCA AGTAGATTCC CTTTTTCATT CACAATCACA TTCTCACAGA CACAGTCCCA GTTTCTACCT GACTGAGATG CAGTAAGGAA TCTGATTATA ACACTCATTG ATTATAACAC TCATTGAATT TATGGATTCC TTACTGCATC TCATTCAGGT AGAAAAAGGG ACTGTGTCTG TGAGAATGTG ATTGTGAATG AAAAAGATGG AATATGTGTA TTTTTGAGTG TCTATGGAAG AGCTTCTGAC AAGAGAGAGG AAGATTAGGT AAAATGAAAT ATCGCCGTCG GCATTTCCCC CTACGT

  34. Single Nucleotide Polymorphisms (SNPs) • A SNP is a single position in the genome (a locus) that is not the same in all people. Some people have one type of nucleotide and other people have a different nucleotide. Differences in the population at a single locus are called polymorphisms and the individual types are called alleles. • SNPs are found experimentally a c g t t a t t a c a t t c c t SNPs

  35. Haplotypes • A haplotype is a collection of SNP alleles on a single chromosome in an individual. • Shown are SNPS on two chromosomes in each individual. a c a t t c a t a t a g t c c a a c g t t c a t a c a g t c c a t c g t t c a t a c a t t c c t a c a g a t a t a c a t t c a a t c a t t c a t a c a t t c c t

  36. Haplotypes • A haplotype is a collection of SNP alleles on a single chromosome in an individual. • Homozygous (same alleles) a c a t t c a t a t a g t c c a a c g t t c a t a c a g t c c a t c g t t c a t a c a t t c c t a c a g a t a t a c a t t c a a t c a t t c a t a c a t t c c t

  37. Haplotypes • A haplotype is a collection of SNP alleles on a single chromosome in an individual. • Heterozygous (different alleles) a c a t t c a t a t a g t c c a a c g t t c a t a c a g t c c a t c g t t c a t a c a t t c c t a c a g a t a t a c a t t c a a t c a t t c a t a c a t t c c t

  38. Haplotypes • A haplotype is a collection of SNP alleles on a single chromosome in an individual. • Rare alleles a c a t t c a t a t a g t c c a a c g t t c a t a c a g a c c a t c g t t c a t a c a t t c c t a c a g a t a t a c a t t c a a t c a t t c a t a c a t t c c t

  39. Haplotypes • A haplotype is a collection of SNP alleles on a single chromosome in an individual. • Strong linkage (usually occur together) a c a t t c a t a t a g t c c a a c g t t c a t a c a g t c c a t c g t t c a t a c a t t c c t a c a g a t a t a c a t t c a a t c a t t c a t a c a t t c c t

  40. Linkage Analysis • SNPs and haplotypes are used to identify regions of the genome that cause disease. The technique is called linkage analysis and evidence of a connection is called linkage disequilibrium (LD). recombination and inheritance a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad a c a g t c c a` a c a g a c a t child

  41. Linkage Analysis • SNPs and haplotypes are used to identify regions of the genome that cause disease. The technique is called linkage analysis and evidence of a connection is called linkage disequilibrium (LD). a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad a c a g t c c a` a c a g a c a t recombination in the mother’s chromosomes child

  42. Linkage Analysis • SNPs and haplotypes are used to identify regions of the genome that cause disease. The technique is called linkage analysis and evidence of a connection is called linkage disequilibrium (LD). a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad a c a g t c c a` a c a g a c a t recombination in the father’s chromosomes child

  43. Linkage Analysis • SNPs and haplotypes are used to identify regions of the genome that cause disease. The technique is called linkage analysis and evidence of a connection is called linkage disequilibrium (LD). a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad a c a g t c c a` a c a g a c a t two to three crossovers per chromosome per generation child

  44. Linkage Analysis • Key point: Alleles that are physically close together tend to be inherited together because the chance of a crossover between them is small. They exhibit strong linkage. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad a c a g t c c a` a c a g a c a t child

  45. Finding an Unknown Disease Locus • The location on the genome of many diseases is unknown. SNPs and haplotypes are being used to search for disease loci using linkage analysis. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t child has disease

  46. Linkage Analysis – Dominant Model • Assume the disease is caused by a dominant allele, meaning one copy is enough to cause the disease. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t SNP alleles in father that are not in mother child has disease

  47. Linkage Analysis – Dominant Model • Assume the disease is caused by a dominant allele, meaning one copy is enough to cause the disease. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t SNP allele in child, inherited from father with disease child has disease

  48. Linkage Analysis – Dominant Model • Assume the disease is caused by a dominant allele, meaning one copy is enough to cause the disease. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t SNP allele and disease are linked indicating possible disease locus. child has disease

  49. Linkage Analysis – Recessive Model • Assume the disease is caused by a recessive allele, meaning two copies are required to cause the disease. a c a t t c a t a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t homozygous SNP alleles in father that are heterozygous in mother child has disease

  50. Linkage Analysis – Recessive Model • Assume the disease is caused by a recessive allele, meaning two copies are required to cause the disease. a c a t t c a t` a t a g t c c a a c a g a t a t t c a t t c a t mom dad has disease a c a g t c c a` a c a g a c a t homozygous SNP allele in child, identical to father’s child has disease

More Related