1 / 105

CS296-5 Genomes, Networks, and Cancer

This course focuses on the study of genomes, networks, and their relationship to cancer. Topics covered include genome assembly, rearrangements, network alignment, and the role of chromosomal aberrations in cancer progression.

gelmer
Download Presentation

CS296-5 Genomes, Networks, and Cancer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS296-5 Genomes, Networks, and Cancer http://cs.brown.edu/courses/cs296-5/

  2. Course Organization • Seminar style • Grading: • 10% participation • 30% paper reviews (3) • 20% presentations (1-2) • 40% final project • Extra credit: 5% for report on visiting seminars (pre-arranged): Maximum 2 • Feb. 1, 4pm (Beerenwinkel), Feb.15, 4pm (E. Myers) • CS requirements: PhD, Area B (Algorithms), ScM, Theory or Practice • Meeting time

  3. Biology 101 Central Dogma

  4. Human Genome Sequenced2000-2003, ??? Now what?

  5. Course Topics Genomes Networks Cancer

  6. Course Topics Genomes Genome Assembly Genome Rearrangements Networks Cancer

  7. Comparative Genomic Architectures: Mouse vs Human Genome • Humans and mice have similar genomes, but their genes are ordered differently • ~245 rearrangements • Reversals • Fusions • Fissions • Translocation

  8. Genome rearrangements Mouse (X chrom.) Unknown ancestor~ 75 million years ago • What are the similarity blocks and how to find them? • What is the architecture of the ancestral genome? • What is the evolutionary scenario for transforming one genome into the other? Human (X chrom.)

  9. 1 2 3 9 10 8 4 7 5 6 Reversals • Blocks represent conserved genes. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

  10. Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 • Blocks represent conserved genes. • In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.

  11. Reversals and Breakpoints 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The reversion introduced two breakpoints(disruptions in order).

  12. Reversal Distance Problem • Goal: Given two permutations, find the shortest series of reversals that transforms one into another • Input: Permutations pand s • Output: A series of reversals r1,…rttransforming p into s, such that t is minimum • t - reversal distance between p and s • d(p, s) - smallest possible value of t, given p and s

  13. Course Topics Genomes Networks Cancer Network Alignment Network Integration

  14. Regulatory Network

  15. Protein-Protein Interaction (PPI) Network

  16. Protein-Protein Interaction Network? • Proteins are nodes • Interactions are edges • Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)

  17. Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

  18. The Network Alignment Problem Given: k different interaction networks belonging to different species, Find: Conserved sub-networks within these networks Conserved defined by protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

  19. Course Topics Genomes Networks Cancer Measuring mutation in cancer genomes Modeling cancer progression

  20. Chromosomal aberrations Structural: translocations, inversions, fissions, fusions. Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions. Tumor Genomes Mutation and selection Compromised genome stability

  21. Tumor Genome Architecture • What are detailed architectures of tumor genomes? • What sequence of rearrangements produce these architectures?

  22. Array Comparative Genomic Hybridization (aCGH)

  23. Next: Tumor Genome → Phenotype Gene Networks and Pathways Integration of Multiple Data Sources • ESP and copy number • Mutation • Expression (mRNA and miRNA) • Binding (ChIP-chip) • Pathways • Epigenetics activation repression Duplicated genes Deleted genes

  24. Additional Topics: Rearrangements in Genetics • Interaction b/w genome rearrangements and single nucleotide polymorphisms (SNPs) • Detecting Rearrangements Under Selection • Population Substructure

  25. DNA Sequencing

  26. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…

  27. DNA Sequencing – Overview 1975 • Gel electrophoresis • Predominant, old technology by F. Sanger • Whole genome strategies • Physical mapping • Walking • Shotgun sequencing • Computational fragment assembly • The future—new sequencing technologies • Pyrosequencing, single molecule methods, … • Assembly techniques • Future variants of sequencing • Resequencing of humans • Microbial and environmental sequencing • Cancer genome sequencing 2015

  28. DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~500 letters at a time

  29. DNA Sequencing – vectors DNA Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =

  30. Different types of vectors

  31. DNA Sequencing – gel electrophoresis • Start at primer (restriction site) • Grow DNA chain • Include dideoxynucleoside (modified a, c, g, t) • Stops reaction at all possible points • Separate products with length, using gel electrophoresis

  32. Electrophoresis diagrams

  33. Challenging to read answer

  34. Challenging to read answer

  35. Challenging to read answer

  36. Reading an electropherogram • Filtering • Smoothening • Correction for length compressions • A method for calling the letters – PHRED PHRED – PHil’s Read EDitor (by Phil Green) Several better methods exist, but labs are reluctant to change

  37. Output of PHRED: a read A read: 500-700 nucleotides A C G A A T C A G …A 16 18 21 23 25 15 28 30 32 …21 Quality scores: -10log10Prob(Error) Reads can be obtained from leftmost, rightmost ends of the insert Double-barreled sequencing: (1990) Both leftmost & rightmost ends are sequenced, reads are paired

  38. Method to sequence longer regions genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~500 bp ~500 bp

  39. Reconstructing the Sequence (Fragment Assembly) reads Cover region with ~7-fold redundancy (7X) Overlap reads and extend to reconstruct the original genomic region

  40. Definition of Coverage C Length of genomic segment: L Number of reads: n Length of each read: l Definition:Coverage C = n l / L How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

  41. Repeat Repeat Repeat Green and yellow fragments are interchangeable when assembling repetitive DNA Challenges in Fragment Assembly • Repeats: A major problem for fragment assembly • > 50% of human genome are repeats: - over 1 million Alu repeats (about 300 bp) - about 200,000 LINE repeats (1000 bp and longer)

  42. Triazzle: A Fun Example The puzzle looks simple BUT there are repeats!!! The repeats make it very difficult. Try it – only $7.99 at www.triazzle.com

  43. Repeat Types Bacterial genomes: 5% Mammals: 50% Repeat types: • Low-Complexity DNA (e.g. ATATATATACATA…) • Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) • Transposons • SINE(Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies • LINE(Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies • LTRretroposons(Long Terminal Repeats (~700 bp) at each end) cousins of HIV • Gene Families genes duplicate & then diverge (paralogs) • Recent duplications ~100,000-long, very similar copies

  44. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions

  45. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  46. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  47. What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads

  48. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT A R B D R C Sequencing and Fragment Assembly 3x109 nucleotides ARB, CRD or ARD, CRB ?

  49. AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides

  50. Strategies for whole-genome sequencing • Hierarchical –Clone-by-clone • Break genome into many long pieces • Map each long piece onto the genome • Sequence each piece with shotgun Example: Yeast, Worm, Human, Rat • Online version of (1) –Walking • Break genome into many long pieces • Start sequencing each piece with shotgun • Construct map as you go Example: Rice genome • Whole genome shotgun One large shotgun pass on the whole genome Example: Drosophila, Human (Celera), Neurospora, Mouse, Rat, Dog

More Related