250 likes | 389 Views
PEDTOOL: Gene hunting based on high-throughput computing. Dan Geiger Computer Science Department, Technion. חיפוש גנים החושפים או גורמים למחלות. מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות
E N D
PEDTOOL: Gene hunting based on high-throughput computing Dan Geiger Computer Science Department, Technion
חיפוש גנים החושפים או גורמים למחלות מדוע לחפש ? 1. בדיקות טרום לידתיות לאוכלוסיה בסיכון גבוהה 2. בדיקת סיכון והתאמת אורך החיים לגורמי סיכון 3. מציאת החלבונים המוטנטים ופיתוח תרופות 4. הבנת תהליכים ביולוגיים בסיסיים כיצד ניתן לחפש ? 1. מציאת משפחות בהם קיימת מחלה המועברת מדור לדור 2. לקיחת בדיקת דם פשוטה ממספר חולים ובריאים 3. ניתוח מעבדתי של הדנא על כל הכרומוזומים 4. ניתוח באמצעים אלגוריתמים. אדגיש שלוש בעיות חישוביות.
Usage of our system in Israeli Hospitals • Rabin Hospital, by Motti Shochat’s group • New locus for mental retardation (2003) • Infantile bilateral striatal necrosis (2004) • Soroka Hospital, by Ohad Birk’s group • Lethal congenital contractural syndrome (2004) • Congenital cataract (2005) • Rambam Hospital, by Eli Shprecher’s group • Congenital recessive ichthyosis (2005) • CEDNIK syndrome (2005) • Galil Ma’aravi Hospital, by Tzipi Falik’s group • Familial Onychodysplasia and dysplasia • Familial juvenile hypertrophy (2005)
Identifygenes(104~105 bp) Resequencing(100 bp) Steps in Gene Hunting Linkageanalysis(106~107 bp)
Male or female Recombinantgametes Recombination During Meiosis
Familial Onychodysplasia and dysplasia of distal phalanges (ODP) III-15 IV-10 IV-7
. M1 M2 Chromosome pair: Marker Information Added(סמנים גנטיים)
M1 M2 D1 D2 M3 M4 θ III-15 151,159 III-16 151,155 a h 202,209 202,202 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation- Two Point Analysis (Task 1) The first computational problem: find a value of θ that maximizes Pr(data|θ,Mode-Of-Iheritance) Data means here one marker data at a time. LOD score (to quantify how confident we are): Z(θ)=log10[Pr(data|θ) / Pr(data|θ=½)].
Maximum Likelihood Evaluation Approach (Task 2) Most probable Haplotype Configuration of some or all persons: Which alleles came from the mother and which from the father ? The second computational problem: argmax Pr(h1,h2,…,h 2n-1, h2n |data,θ,MOI) For each person, there are 2k possible haplotypes, where k is the number of markers considered.
M1 M2 D1 M3 M4 θ III-15 151,159 III-16 151,155 202,209 202,202 a h 139,141 139,146 1,2 3,3 Maximum Likelihood Evaluation Multipoint Analysis(Task 3) The third computational problem: find a value of θ that maximizes Pr(data|θ,MOI) Data now means considering several markers at once.
23 Results of Multipoint Analysis
This problem is equivalent to finding the best order for sum-product operations for high dimensional matrices: The Computational Task • Computing P(data|θ) for a specific value of θ:
Stochastic Greedy Ordering Algorithm(s) • Iteration i: • three indices yielding minimal table size are found. • a coin (biased according to the resulting table size) is flipped to choose between them. • The algorithm is repeated many times unless a low cost elimination sequence is found. Repeat these steps with several cost functions.
But we can fix the value of the index m, namely, condition on m’s value, and do each part as a separate job: When intermediate tables become too large for a given RAM, computation virtually halts:
The Pedtool System • Divides the computation of a single likelihood to hundreds of computers. • Uses Condor at UW-Madison research pool. • Simple user interface – used by novices • Able to compute a highly inbred pedigree with 250 individuals sent by NIH. Faster by 1-5 orders of magnitude over other linkage programs.
Running times improvements bioinfo.cs.technion.ac.il/pedtool
The Main Goals of future Research • Efficiency • Simplicity • Availability online to all Israeli researchers. • More functionalities bioinfo.cs.technion.ac.il/pedtool
Acknowledgements Students: Ma’ayan Fishelson, Ph.D (Graduated 2004) Dmitry Rusakov, Ph.D (Graduated 2004) Anna Tzemach, M.Sc Nickolay Dovgolevsky, B.Sc (Graduated, 2004) Mark Silberstein, M.Sc Julia Stolin Edward Vitkin Collaborators from medical genetics: Motti Shochat and Tami Shochat (Rabin) Ohad Birk and Rivka Ophir (Soroka) Tzipi Falik and Morad Khayat (Galil Ma’aravi) Collaborators from distributed systems: Assaf Schuster Pedtool is to be hosted by DSL at the CS/Technion and supported by IBM, ISF, Israeli Science Ministry