Optimatization of a New Score Function for the Detection of Remote Homologs

Optimatization of a New Score Function for the Detection of Remote Homologs Kann et al

Introduction • New method to calculate a score function, aiming to optimize the ability to discriminate between homologs and non-homologs • Existing software uses the following to compute an alignment score:

Number of times AA i is aligned with AA j Number of gaps in alignment Number of residues in each gap beyond one Score function / Substitution matrix Contribution to score for AA match/mismatch Contribution to score for gap initialization Contribution to score for gap extension

Current Methods to Calculate Homology • p(Sr > x): probability that a random pair of proteins of the same length would have that score • E: expected number of random proteins in the db that would have at least that score • P: probability that there is at least one random pair with a higher score • As p(Sr > x), E, P increase, the likelihood that the given pair is homologous decreases

Current Score Matrices • PAM (percent accepted mutations) – Dayhoff • GCB, JTT: used to apply to larger sequence datasets • BLOSUM62 – Henikoff & Henikoff, constructed using a dataset of aligned sequence blocks • STR – protein sequences aligned based on their observed structures

Limitations of Current Score Functions • Current score functions assume independent evolution of each location, overlooking correlations • Score functions derived from a db of properly aligned proteins, not on alignments between random sequences • Gap penalty a priori

Theory Z score for alignment: • Characterize the significance of alignment score by calculating the likelihood that this score or higher would be obtained by a random match • Account for variations in E with the length of the proteins

Theory • Score function optimized by maximizing the confidence <C> over the training set • Avoids dependence on extreme E values (easily detected or overly distant homologies) • Eliminates contribution of falsely identified homologies (overly distant)

Database Preparation • Use set of known homologs whose homology cannot be reliably determined with standard pairwise comparison, in order to optimize score function for detection of distant homologs • Training set: 900 pairs of protein in same COG with < 25% sequence identity

Optimization of Score Function • Align using BLOSOM62 matrix • Calculate Z and C for each pair of homologs, then averaged over pairs in training set to yield <C> • Generate initial alignments using gap penalties that yielded highest C values • ~10 cycles of optimization and realignments until score function converged

Results • Small changes in gap penalties: most of the improvement cones from refinements of • OPTIMA: resulting score function • has significantly improved average confidence <C> value compared with other score matrices • <p(Sr > x)>, <P> significantly decreased

Summary • Aim: optimize score matrix to discriminate between homologs and non-homologs • OPTIMA score function: more successful at discriminating between homologs and non-homologs compared with standard score matrices • Gap penalties treated as additional parameters to be optimized

Optimatization of a New Score Function for the Detection of Remote Homologs

Optimatization of a New Score Function for the Detection of Remote Homologs

Presentation Transcript

The Function of

Polarization for Remote Chemical Detection

New forms of remote for the living room

A proposal for the function of canonical microcircuits

The alignment of one pair of homologs is independent of any other.

Detection rates for a new waveform

Evaluation of a new trigger function for shallow cumulus convection

Score Function for Data Mining Algorithms

Remote Measurement station for fishes detection

Remote homology detection

Derivative of a function

Remote Function Call

Remote detection of HABs in the Gulf of Mexico in 2005

A new technique for remote sensing of Trichodesmium

The New Listing Score

Pulsed photonuclear technology for remote detection of nuclear materials

Determine whether the inverse of a function is a function.

Characterization of new crystals for X-rays detection

The Limit of a Function

Derivative of a Function

A polynomial function is a function of the form