400 likes | 430 Views
CSCI2950-C Genomes, Networks, and Cancer. http://cs.brown.edu/courses/cs2950-c/. Course Organization. Seminar style Grading: 10% participation 30% paper reviews (~10) 20% presentations (2-3) 40% project. Midterm proposal and final presentations
E N D
CSCI2950-C Genomes, Networks, and Cancer http://cs.brown.edu/courses/cs2950-c/
Course Organization • Seminar style • Grading: • 10% participation • 30% paper reviews (~10) • 20% presentations (2-3) • 40% project. Midterm proposal and final presentations • Extra credit: 5% for report on comp. bio. seminars (pre-arranged): Maximum 2 • CS requirements: PhD: Area B (Algorithms) ScM: Theory or Practice* *depending on project Significant programming*
Biology 101 Central Dogma
What can we measure? Sequencing (expensive) Hybridization (noisy) Sequencing (expensive) Hybridization (noisy) Mass spectrometry (noisy) Hybridization (very noisy!)
Human Genome Sequenced2000-2003, ??? • Comparative Genomics • Genomes of other mammals • 2) Disease genomics • HapMap • 1000 Genomes • 3) Cancer genomics • The Cancer Genome Atlas • Tumor Sequencing Project Now what? Sept. 3 “In the Genome Race, the Sequel Is Personal”
Course Topics Genomes Algorithms Visualization Machine Learning Systems Data Mining Networks Cancer
Course Topics Genomes Networks Cancer
DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT ACTGATGACTAGATTACAG ACTGATTTAGATACCTGAC TGATTTTAAAAAAATATT…
DNA Sequencing Goal: Find the complete sequence of A, C, G, T’s in DNA Challenge: There is no machine that takes long DNA as an input, and gives the complete sequence as output Can only sequence ~500 letters at a time
Shotgun Sequencing genomic segment cut many times at random (shotgun) Get one or two reads from each segment ~500 bp ~500 bp
Fragment Assembly reads Cover region with ~7-fold redundancy Overlap reads and extend to reconstruct the original genomic region
Comparative Genomics Compare genome sequences of related species. • Sequence Alignment • Phylogeny: defining ancestral relationships between species, genomes, genes. • human • mouse • Sequences • Evolve via substitutions • Conservation suggests function • EFTPPVQAAYQKVVAG • DFNPNVQAAFQKVVAG
History of Chromosome X Rat Consortium, Nature, 2004
Comparative Genomic Architectures: Mouse vs Human Genome • Humans and mice have similar genomes, but their genes are ordered differently • ~245 rearrangements • Reversals • Fusions • Fissions • Translocation
Genome rearrangements Mouse (X chrom.) • What are the similarity blocks and how to find them? • What is the architecture of the ancestral genome? • What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor~ 75 million years ago Human (X chrom.)
1 2 3 9 10 8 4 7 5 6 Reversals • Blocks represent conserved genes. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Reversals 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 • Blocks represent conserved genes. • In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10.
Reversals and Breakpoints 1 2 3 9 10 8 4 7 5 6 1, 2, 3, -8, -7, -6, -5, -4, 9, 10 The reversion introduced two breakpoints(disruptions in order).
Sorting Permutations by Reversals = 12…n signed permutation (Sankoff et al.1990) Reversal (i,j) [inversion] 1…i-1 -j... -ij+1…n Problem: Given , find a sequence of reversals 1, …, t with such that: . 1 . 2 … t = (1, 2, …, n) andt is minimal. Solution: Analysis of breakpoint graph • Polynomial time algorithms • O(n4) : Hannenhalli and Pevzner, 1995. O(n2) : Kaplan, Shamir, Tarjan, 1997. • O(n) [distance t] : Bader, Moret, and Yan, 2001.O(n3) : Bergeron, 2001.
Course Topics Genomes Networks Cancer Network Alignment Network Integration
Biological Interaction Networks Many types: • Protein-DNA (regulatory) • Protein-metabolite (metabolic) • Protein-protein (signaling) • RNA-RNA (regulatory) • Genetic interactions (gene knockouts)
Metabolic Networks Nodes = reactants Edges = reactions labeled by enzyme (protein) that catalyzes reaction
Protein-Protein Interaction Network? • Proteins are nodes • Interactions are edges • Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)
Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
The Network Alignment Problem Given: k different interaction networks belonging to different species, Find: Conserved sub-networks within these networks Conserved defined by protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
Protein Signaling Networks Art Salomon Biology Department Use machine learning methods (Bayesian networks, etc. to derive network structure.
Computational Problems • Classifying Network Topology • Finding paths, cliques, dense subnetworks, etc. • Comparing Networks Across Species • Using networks to explain data • Dependencies revealed by network topology • Modeling dynamics of networks
Course Topics Genomes Networks Cancer Cancer Genomes Cancer & Networks
Single nucleotide change Cell Division and Mutation Copy number Structural
Cancer Genomes Leukemia Breast
Rearrangements in Tumors Change gene structure, create novel fusion genes • Gleevec (Novartis 2001) targets ABL-BCR fusion
Challenges • What is detailed organization of cancer genomes? • What genes affected? • What sequence of rearrangements produced this genome? • Can we create custom treatments for tumors based on mutational spectrum? (e.g. Gleevec)
Tumor Genome → Phenotype Gene Networks and Pathways Integration of Multiple Data Sources • ESP and copy number • Mutation • Expression • (mRNA and miRNA) • Binding (ChIP-chip) • Pathways • Epigenetics activation repression Duplicated genes Deleted genes
Tumor Genome → Phenotype Gene Networks and Pathways Integration of Multiple Data Sources • ESP and copy number • Mutation • Expression • (mRNA and miRNA) • Binding (ChIP-chip) • Pathways • Epigenetics activation repression Duplicated genes Deleted genes
Course Topics Genomes Algorithms Visualization Machine Learning Systems Data Mining Networks Cancer