630 likes | 735 Views
An Iterative Relaxation Technique for the NMR Backbone Assignment Problem. Wen-Lian Hsu Institute of Information Science Academia Sinica. Characteristics of Our Method. Model this as a constraint satisfaction problem Solve it using natural language parsing techniques
E N D
An Iterative Relaxation Technique for the NMR Backbone Assignment Problem Wen-Lian Hsu Institute of Information Science Academia Sinica
Characteristics of Our Method • Model this as a constraint satisfaction problem • Solve it using natural language parsing techniques • Both top-down and bottom-up • An iterative approach • Create spin systems based on noisy data. • Link spin systems by using maximum independent set finding techniques.
Outline • Introduction • Method • Experiment Results • Conclusion
Blind Man’s Elephant • We cannot directly “see” the positions of these atoms (the structure) • But we can measure a set of parameters (with constraints) on these atoms • Which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together
The Flow of NMR Experiments Calculation and simulation - Energy minimization - Fitness of structure constraints Get protein Samples Collect NMR spectra Resonance assignment Structure Constraints
Chemical Shift Assignment Find out Chemical Shift for Each Atom • Backbone atoms: Ca, Cb, C’, N, NH • Various experiments: HSQC, CBCANH, CBCACONH, HN(CA)CO, HNCO, HN(CO)CA, HNCA • Side chain: all others (especially CHs) • TOCSY-HSQC, HCCCONH, CCCONH, HCCH-TOCSY Cd H3 Cg H2 One amino acid Cb H2 Ca N CO H H
18-23 55-60 17-23 30-35 16-20 31-34 19-24 Some Relevant Parameters ppm CH3 CH3 O H H H H-C-H O Backbone -N-C-C-N-C-C-N-C-C-N-C-C- H-C-H H H-C-H H O O H O H
HSQC Three important experiments • Backbone: Ca, Cb,C’,N,NH • HSQC, CBCANH, CBCA(CO)NH, HN(CA)CO, HNCO, HN(CO)CA, HNCA • sequential assignment • chemical shifts of Ca, Cb, NH
Our NMR spectra CBCA(CO)NH CBCANH • HSQC • CBCA(CO)NH (2 peaks) • HNCACB (4 peaks)
HSQC HSQC Spectra • HSQC peaks (1 chemical shifts for an amino acid)
CBCA(CO)NH Spectra • CBCA(CO)NH peaks (2 chemical shifts for one amino acid)
- - + + CBCANH Spectra • CBCANH peaks (4 chemical shifts for one amino acid) • Ca (+), Cb (-)
H N A Dataset Example • HSQC • HNCACB 4 • CBCA(CO)NH 2
Backbone Assignment • Goal • Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. • General approaches • Generate spin systems • A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). • Link spin systems
Ambiguities • All 4 point experiments are mixed together • All 2 point experiments are mixed together • Each spin system can be mapped to several amino acids in the protein sequence • False positives, false negatives
Legal matching Illegal matching under constraints Previous Approaches • Constrained bipartite matching problem • The spin system might be ambiguous • Can’t deal with ambiguous link
Natural Language Processing ─ Signal or Noise? • Speech recognition:Homophone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位
Hierarchical Analysis 句意模版 句型模版 片語模版 字詞模版
Perfect Group • Each spin group contains 6 points, in which • 4 points are from the first experiments • 2 points are from the second experiment H O a H C a C b N C C b H C H O a H C C a b N C C b H C
H H O O a H a H C C C a a C b N b C N C C C b b H H C C Perfect Group • Each spin group contains 6 points, in which • 4 points are from the first experiments • 2 points are from the second experiment H O a H C C a b N C C b H C
A Perfect Spin System Group CBCA(CO)NH i -1 i -1 CBCANH Ca Ca Cb Cb
False Positives and False Negatives • False positives • Noise with high intensity • Produce fake spin systems • False negatives • Peaks with low intensity • Missing peaks • In real wet-lab data, nearly 50% are noises (false positive).
Perfect H False Negative False Positive N Spin System Group
Outline • Introduction • Method • Experiment Results • Conclusion
Main Idea • Deal with false negative in spin system generation procedures. • Eliminate false positive in spin system linking procedures. • Perform spin system generation and linking procedures in an iterative fashion.
Spin System Group Generation • Three types of spin system group are generated based on the quality of CBCANH data: • Perfect • Weak false negative • Severe false negative
Perfect Spin Systems • A spin system is determined without any added pseudo peak. CBCA(CO)NH i -1 i -1 CBCANH Ca Ca Cb Cb
Weak False Negative Spin System Group • A spin system is determined with an added pseudo peak. CBCA(CO)NH i -1 i -1 CBCANH Ca Cb Cb 115.481 9.604 60.044 1.30407e+008 Ca
Severe false Negative Spin System Group • A spin system is determined with two added pseudo peaks. CBCA(CO)NH i -1 i -1 CBCANH Ca Note: it is also possible thatCai-1 = 28.166 and Cbi-1 = 59.419 Cb 119.857 8.435 28.166 3.36293e+007 119.857 8.435 59.419 1.56434e+008 Cb Ca
A note on spin system generation • To generate *ALL* possible spin systems, a peak can be included in more than one spin system. • False positives are eliminated in spin system linking procedure. • False negative are treated by adding pseudo peaks. • A rule-based mechanism is used to filter out incompatible spin systems (false positives). • Adopt maximum weight independent set algorithm
Spin System Linking • Goal • Link spin system as long as possible. • Constraints • Each spin system is uniquely assigned to a position of the target protein sequence. • Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.
A Peculiar Parking Lot (valet parking) Information you have: The make of your car, the car parked in front of you (approximately).Together with others, try to identify as many cars in the right order as possible (maximizing the overall satisfaction).
Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS
Spin System Positioning • We assign spin system groups to a protein sequence according to their codes. D 50 G 10 R 40 I 50|51 55.26638.67544.5550 Spin System 44.417055.04330.04 55.26638.67544.5550 => 50 10 44.417055.04330.04 =>10 40 44.417030.66528.72 44.417030.66528.72 =>10 40 5535629.78260.04437.541 5535629.78260.04437.541 => 40 50
Segment 1 Segment 2 Segment 3 Link Spin System groups D G R I 44.417030.66528.72 55.26638.67544.5550 44.417055.04330.04 5535629.78260.04437.541
Step1 1 1 … 2 56 47 Step2 … Segment 1 Segment 31 Segment 2 Step n-1 … Segment 78 Segment 79 Iterative Concatenation DGRI….FKJJREKL 1 Spin Systems 2 …. 56 …. Step n Segment 99
Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 79 Segment 71 Segment 97 Segment 99 Segment 98 • Two kinds of conflict segments • Overlap (e.g. segment 71, segment 99) • Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1 )
A Graph Model for Spin System Linking • G(V,E) • V: a set of nodes (segments). • E:(u, v), u, v V,u and v are conflict. • Goal • Assign as many non-conflict segments as possible => find the maximum independent set of G.
SP13 Seg2 Overlap Overlap SP15 Seg4 Seg1 Seg3 Seg4 Seg2 An Example of G Seg1 Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg3 • Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE
Segment weight • The larger length of segment is, the higher weight of segment is. • The less frequency of segment is, the higher of segment is.
Find Maximum Weight Independent Set of G • Boppana, R. and M.M. Halldόrsson, Approximatin Maximum Independent Sets bt Excluding Subgraphs. BIR, 1992. 32(2).
An Iterative Approach • We perform spin system generation and linking iteratively. • Three stages.
First Stage • Generate perfect spin systems; • Perform spin system concatenation on spin systems (newly generated perfect) to generate segments; • Retain segments that contain at least 3 spin systems; • Perform MaxIndSet on the segments; • Drop spin systems (and related peaks) that are used in the resulting segments.
Second Stage • Generate weak false negative spin systems. • Perform segment extension on the resulting segments of the first iteration (using unused perfect and newly generated weak false negative); • Perform spin system concatenation on the unused spin systems (perfect + weak false negative) to generate longer segments; • Retain segments that contain at least 3 spin systems; • Perform MaxIndSet on the segments; • Drop spin systems (and related peaks) that are used in the resulting segments.
Third Stage • Generate severe false negative spin systems. • Perform segment extension on the resulting segments of the second iteration (using unused perfect and weak false negative, as well as newly generated severe false negative); • Perform spin system concatenation on the unused spin systems (perfect + weak false negative + severe false negative) to generate longer segments; • Retain segments that contain at least 3 spin systems; • Perform MaxIndSet on the segments.
12 29 109 29 Segment Extension ….FKJJREKL…. 109 New spin systems 1 2 …. 45 New 109
97 78 77 99 97‘ 71 99‘ 77 99‘ 97‘ Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS 97 23 99 24 26 45 28 27 31 28 29 32 33 MaxIndSet
Outline • Introduction • Method • Experimental Results • Conclusion