430 likes | 606 Views
Jia-Ming Chang 0509 Graph Algorithms and Their Applications to Bioinformatics. Graph algorithm in NMR backbone assignment. Determine Protein Structure. X-ray 波長約 1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構 X-ray 與結構生物學 利用 X-ray 繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。
E N D
Jia-Ming Chang 0509 Graph Algorithms and Their Applications to Bioinformatics Graph algorithm in NMR backbone assignment
Determine Protein Structure • X-ray • 波長約1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構 • X-ray與結構生物學 利用X-ray繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。 • Nuclear magnetic resonance (NMR) • NMR是涉及原子核吸收的過程。因為對某些原子核而言,具有自旋和磁矩的性質。因此,若暴露於強磁場中原子核會吸收電磁輻射,這是由磁場誘導而發生能階分裂的結果。科學家並發現,分子環境會影響在磁場中原子核的無線電波的吸收,利用這種特性來分析分子的結構 AVANCE 800 AV IBMS, Sinica
Cd H3 Cg H2 Cb H2 Ca N CO H H Chemical Shift Assignment (1/2) Find out Chemical Shift for Each Atom • Backbone: Ca, Cb, C’, N, NH • HSQC, CBCANH, CBCACONH One amino acid
Chemical Shift Assignment (2/2) 18-23 55-60 17-23 30-35 16-20 31-34 19-24 ppm CH3 CH3 O H H H H-C-H O Backbone -N-C-C-N-C-C-N-C-C-N-C-C- H-C-H H H-C-H H O O H O H
HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid) HSQC
CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino acid)
CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid) Ca (+), Cb (-) - - + +
A Dataset Example H • HSQC • HNCACB • CBCA(CO)NH N
A Perfect Spin System Group CBCA(CO)NH i -1 i -1 CBCANH Ca Ca Cb Cb
Coding • Translate the target protein sequence and spin systems into coding sequences based on the following table. Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.
Backbone Assignment • Goal • Assign chemical shifts to N, NH, Ca (and Cb) along the protein backbone. • General approaches • Generate spin systems • A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb). • Link spin systems
Backbone Assignment DGRIGEIKGRKTLATPAVRRLAMENNIKLS
Blind Men’s Elephant • We cannot directly “see” the positions of these atoms (the 3D structure) • But we can measure a set of parameters (with constraints) on these atoms, which can help us infer their coordinates Each experiment can only determine a subset of parameters (with noises) To combine the parameters of different experiments we need to stitch them together
A Peculiar Parking Lot (valet parking) Information you have:The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars as possible (maximizing the overall satisfaction).
Ambiguities • All 4 point experiments are mixed together • All 2 point experiments are mixed together • Each spin system can be mapped to several amino acids in the protein sequence • False positives, false negatives
Multiple Candidates One spin system maybe assign to many places of a protein sequence. Spin system(SS) Protein Sequence: AKFERQHMDSSTSRNLTKDR Possible place SS SS SS SS
False Positives and False Negatives • False positives • Noise with high intensity • Produce fake spin systems • False negatives • Peaks with low intensity • Missing peaks • In real wet-lab data, nearly 50% are noises (false positive).
False Positive & False Negative Perfect False Negative H False Positive • HSQC • HNCACB • CBCA(CO)NH N
Ambiguous Spin System Two possible spin systems
Spin System Group • Nearest Neighboring (TATAPRO, RIBRA, GASA) H • HSQC • HNCACB • CBCA(CO)NH N
Spin System Linking • Goal • Link spin system as long as possible. • Constraints • Each spin system is uniquely assigned to a position of the target protein sequence. • Two spin systems are linked only if the chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.
Legal matching Previous Approaches • Constrained bipartite matching problem* • Can’t deal with ambiguous link Illegal matching under constraints *Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.
Naatural Language Processing ─ Noises or Ambiguity ? • Speech recognition:Homopone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位
Spin System Positioning • We assign spin system groups to a protein sequence according to their codes. D 50 G 10 R 40 I 50|51 55.26638.67544.5550 Spin System 44.417055.04330.04 55.26638.67544.5550 => 50 10 44.417055.04330.04 =>10 40 44.417030.66528.72 44.417030.66528.72 =>10 40 5535629.78260.04437.541 5535629.78260.04437.541 => 40 50
Segment 1 Segment 2 Segment 3 Link Spin System groups D G R I 44.417030.66528.72 55.26638.67544.5550 44.417055.04330.04 5535629.78260.04437.541
Step1 1 1 … 2 56 2 47 Step2 … Segment 1 Segment 31 Segment 2 Step n-1 Step n … Segment 78 Segment 99 Segment 79 Iterative Concatenation DGRI….FKJJREKL 1 Spin Systems 2 …. 56 ….
Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 79 Segment 71 Segment 97 Segment 99 Segment 98 • Two kinds of conflict segments • Overlap (e.g. segment 71, segment 99) • Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1)
Independent Set Subset S of vertices such that no two vertices in S are connected www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt
Independent Set Subset S of vertices such that no two vertices in S are connected www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt
A Graph Model for Spin System Linking G(V,E) V: a set of nodes (segments). E:(u, v), u, v V,u and v are conflict. Goal Assign as many non-conflict segments as possible => find the maximum independent set of G.
SP13 Seg2 Overlap Overlap SP15 Seg4 Seg1 Seg3 Seg4 Seg2 An Example of G Seg1 Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg3 • Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE
Segment weight • The larger length of segment is, the higher weight of segment is. • The less frequency of segment is, the higher of segment is.
Find Maximum Weight Independent Set of G (1/2) Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2). V N(v) Head_N(v)
Find Maximum Weight Independent Set of G (2/2) Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2). V I1 I2
An Iterative Approach • We perform spin system generation and linking iteratively. • Three stages. • Perfect spin systems • Weak false negative spin systems • Severe false negative spin systems
97 78 77 99 97‘ 71 99‘ 77 99‘ 97‘ Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS 97 23 99 24 26 45 28 27 31 28 29 32 33 MaxIndSet