1.05k likes | 1.06k Views
This course focuses on the study of genomes, networks, and their relationship to cancer. Topics covered include genome assembly, rearrangements, network alignment, and the role of chromosomal aberrations in cancer progression.
E N D
CS296-5 Genomes, Networks, and Cancer http://cs.brown.edu/courses/cs296-5/
Course Organization • Seminar style • Grading: • 10% participation • 30% paper reviews (3) • 20% presentations (1-2) • 40% final project • Extra credit: 5% for report on visiting seminars (pre-arranged): Maximum 2 • Feb. 1, 4pm (Beerenwinkel), Feb.15, 4pm (E. Myers) • CS requirements: PhD, Area B (Algorithms), ScM, Theory or Practice • Meeting time
Biology 101 Central Dogma
Human Genome Sequenced2000-2003, ??? Now what?
Course Topics Genomes Networks Cancer
Course Topics Genomes Genome Assembly Genome Rearrangements Networks Cancer
Comparative Genomic Architectures: Mouse vs Human Genome • Humans and mice have similar genomes, but their genes are ordered differently • ~245 rearrangements • Reversals • Fusions • Fissions • Translocation
Genome rearrangements Mouse (X chrom.) Unknown ancestor~ 75 million years ago • What are the similarity blocks and how to find them? • What is the architecture of the ancestral genome? • What is the evolutionary scenario for transforming one genome into the other? Human (X chrom.)
1 2 3 9 10 8 4 7 5 6 Reversals • Blocks represent conserved genes. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 • Blocks represent conserved genes. • In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
Reversals and Breakpoints 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The reversion introduced two breakpoints(disruptions in order).
Reversal Distance Problem • Goal: Given two permutations, find the shortest series of reversals that transforms one into another • Input: Permutations pand s • Output: A series of reversals r1,…rttransforming p into s, such that t is minimum • t - reversal distance between p and s • d(p, s) - smallest possible value of t, given p and s
Course Topics Genomes Networks Cancer Network Alignment Network Integration
Protein-Protein Interaction Network? • Proteins are nodes • Interactions are edges • Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)
Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
The Network Alignment Problem Given: k different interaction networks belonging to different species, Find: Conserved sub-networks within these networks Conserved defined by protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
Course Topics Genomes Networks Cancer Measuring mutation in cancer genomes Modeling cancer progression
Chromosomal aberrations Structural: translocations, inversions, fissions, fusions. Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions. Tumor Genomes Mutation and selection Compromised genome stability
Tumor Genome Architecture • What are detailed architectures of tumor genomes? • What sequence of rearrangements produce these architectures?
Next: Tumor Genome → Phenotype Gene Networks and Pathways Integration of Multiple Data Sources • ESP and copy number • Mutation • Expression (mRNA and miRNA) • Binding (ChIP-chip) • Pathways • Epigenetics activation repression Duplicated genes Deleted genes
Additional Topics: Rearrangements in Genetics • Interaction b/w genome rearrangements and single nucleotide polymorphisms (SNPs) • Detecting Rearrangements Under Selection • Population Substructure
DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…
DNA Sequencing – Overview 1975 • Gel electrophoresis • Predominant, old technology by F. Sanger • Whole genome strategies • Physical mapping • Walking • Shotgun sequencing • Computational fragment assembly • The future—new sequencing technologies • Pyrosequencing, single molecule methods, … • Assembly techniques • Future variants of sequencing • Resequencing of humans • Microbial and environmental sequencing • Cancer genome sequencing 2015
DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~500 letters at a time
DNA Sequencing – vectors DNA Shake DNA fragments Known location (restriction site) Vector Circular genome (bacterium, plasmid) + =
DNA Sequencing – gel electrophoresis • Start at primer (restriction site) • Grow DNA chain • Include dideoxynucleoside (modified a, c, g, t) • Stops reaction at all possible points • Separate products with length, using gel electrophoresis
Reading an electropherogram • Filtering • Smoothening • Correction for length compressions • A method for calling the letters – PHRED PHRED – PHil’s Read EDitor (by Phil Green) Several better methods exist, but labs are reluctant to change
Output of PHRED: a read A read: 500-700 nucleotides A C G A A T C A G …A 16 18 21 23 25 15 28 30 32 …21 Quality scores: -10log10Prob(Error) Reads can be obtained from leftmost, rightmost ends of the insert Double-barreled sequencing: (1990) Both leftmost & rightmost ends are sequenced, reads are paired
Method to sequence longer regions genomic segment cut many times at random (Shotgun) Get one or two reads from each segment ~500 bp ~500 bp
Reconstructing the Sequence (Fragment Assembly) reads Cover region with ~7-fold redundancy (7X) Overlap reads and extend to reconstruct the original genomic region
Definition of Coverage C Length of genomic segment: L Number of reads: n Length of each read: l Definition:Coverage C = n l / L How much coverage is enough? Lander-Waterman model: Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides
Repeat Repeat Repeat Green and yellow fragments are interchangeable when assembling repetitive DNA Challenges in Fragment Assembly • Repeats: A major problem for fragment assembly • > 50% of human genome are repeats: - over 1 million Alu repeats (about 300 bp) - about 200,000 LINE repeats (1000 bp and longer)
Triazzle: A Fun Example The puzzle looks simple BUT there are repeats!!! The repeats make it very difficult. Try it – only $7.99 at www.triazzle.com
Repeat Types Bacterial genomes: 5% Mammals: 50% Repeat types: • Low-Complexity DNA (e.g. ATATATATACATA…) • Microsatellite repeats (a1…ak)N where k ~ 3-6 (e.g. CAGCAGTAGCAGCACCAG) • Transposons • SINE(Short Interspersed Nuclear Elements) e.g., ALU: ~300-long, 106 copies • LINE(Long Interspersed Nuclear Elements) ~4000-long, 200,000 copies • LTRretroposons(Long Terminal Repeats (~700 bp) at each end) cousins of HIV • Gene Families genes duplicate & then diverge (paralogs) • Recent duplications ~100,000-long, very similar copies
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides 50% of human DNA is composed of repeats Error! Glued together two distant regions
What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads
What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads
What can we do about repeats? Two main approaches: • Cluster the reads • Link the reads
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT A R B D R C Sequencing and Fragment Assembly 3x109 nucleotides ARB, CRD or ARD, CRB ?
AGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCTAGTAGCACAGACTACGACGAGACGATCGTGCGAGCGACGGCGTAGTGTGCTGTACTGTCGTGTGTGTGTACTCTCCT Sequencing and Fragment Assembly 3x109 nucleotides
Strategies for whole-genome sequencing • Hierarchical –Clone-by-clone • Break genome into many long pieces • Map each long piece onto the genome • Sequence each piece with shotgun Example: Yeast, Worm, Human, Rat • Online version of (1) –Walking • Break genome into many long pieces • Start sequencing each piece with shotgun • Construct map as you go Example: Rice genome • Whole genome shotgun One large shotgun pass on the whole genome Example: Drosophila, Human (Celera), Neurospora, Mouse, Rat, Dog