130 likes | 333 Views
Threading, RAPTOR, and CRF. In-Ho Lee KRISS/In Silico Protein Science. 양평 한화리조트 2009 8 24. Threading?. aka fold recognition Method of protein structure prediction Sequence known structure Using statistical knowledge btw sequence and structure
E N D
Threading, RAPTOR, andCRF In-Ho Lee KRISS/In Silico Protein Science 양평 한화리조트 2009 8 24
Threading? • aka fold recognition • Method of protein structure prediction • Sequence known structure • Using statistical knowledge btw sequence and structure • Placing, aligning (“threading”) each aa in the sequence to a position in the template structure • Evaluate how well the target fits the template • Homology modeling(easy targets) vs protein threading(hard targets) • Fold-level homology • Sequence-sequence alignment, sequence identity (25 %)
Homology modeling • aka comparative modeling • One or more known protein structures • Alignment: map residues in the query sequence to residues in the template sequence
Bowie, Luthy, Eisenberg, 1991 • Threading: Jones, Tayler, Thornton, 1992 • Fold recognition ~ threading • X-ray, NMR : 70-80 %, similar fold • 1100 different protein folds known • New folds are still being discovered • Many different algorithms have been proposed.
1-D profile (3-D structure) : target sequence • Full 3-D structure of the template • Global, local alignments • Insertions/deletions are less to occur in SS regions and solvent inaccessible regions
RAPTOR(software) • Threading software: PSP • Globally optimize a scoring function with pairwise contact potential and produce a globally optimal alignment • Rangking list of templates http://en.wikipedia.org/wiki/RAPTOR_(software)
RAPTOR(software) • Threading engines: NoCore, NPCore, IP • Integer programming • if a scoring function has pairwise contact potential included, dynamic programming cannot globally optimize such a scoring function and instead just generates a local optimal alignment. • FR targets http://en.wikipedia.org/wiki/RAPTOR_(software)
RAPTOR(software) • Min F(x1,x2,..), g(x1,x2,..)>b • 3D modeling: OWL, MODELLER • Loops, backbone, sidechains, packed up • Cyclic coordinate descent algorithm • Tree decomposition algorithm • PSI-BLAST, Jmol http://en.wikipedia.org/wiki/RAPTOR_(software)
RAPTOR(software) • CASP5, 2002 • Cores, loops • {c1,c2,c3,..,cM} • c_i=(head_i, tail_i) • c_i core is aligned to s_j • (s_j, s_j+len_i-1) • (c_i1,s_j1),(c_i2,s_j2) • (loc_i1-loc_i2)x(loc_i2-loc_i1+s_j1-s_j2)
c • c • c build alignment mask between template and probe proteins • c • do i = 1, nres • do j = 1, nseq • mask(j,i) = template(seqtyp(j),i) • end do • end do • do i = 1, nres • mask(0,i) = 0 • end do • do j = 1, nseq • mask(j,0) = 0 • end do • c • c complete the mask by the successive summation procedure • c • do i = 1, nres • do j = 1, nseq • prior = mask(j-1,i-1) • jtrace = j - 1 • itrace = i - 1 • do k = 1, j-2 • gap = open(i-1) + (j-1-k)*widen(i-1) • value = mask(k,i-1) + gap • if (value .gt. prior) then • prior = value • jtrace = k • itrace = i - 1 • end if • end do • do k = 1, i-2 • gap = open(i-1) + (i-1-k)*widen(i-1) • value = mask(j-1,k) + gap • if (value .gt. prior) then • prior = value • jtrace = j - 1 • itrace = k • end if • end do • mask(j,i) = mask(j,i) + prior • trace(1,j,i) = jtrace • trace(2,j,i) = itrace • end do • end do
c c find the end of the best alignment in last row or column c score = -1000000 do i = 1, nres if (mask(nseq,i) .gt. score) then score = mask(nseq,i) ires = i jseq = nseq end if end do do j = 1, nseq if (mask(j,nres) .gt. score) then score = mask(j,nres) ires = nres jseq = j end if end do c c backtrace to locate the remainder of the optimal alignment c do i = 1, nres match(i) = 0 end do match(ires) = jseq dowhile (ires .ne. 0) match(ires) = jseq jtmp = trace(1,jseq,ires) itmp = trace(2,jseq,ires) jseq = jtmp ires = itmp end do
Conditional random fields • X data sequence, y label sequence • Probabilistic model