1 / 12

Speaker:

Speaker:. (Peter)Xiaoyong Wu Bioinformatics 4/28/03. Topic. Including Biological Literature Improves Homology Search Jeffrey T. Chang, Soumya Rachaudhuri, and Russ B. Altman (Paper Source: http://www.jeffchang.com/). Problem. Target of bio-sequence study:

anne-sharpe
Download Presentation

Speaker:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker: • (Peter)Xiaoyong Wu • Bioinformatics • 4/28/03

  2. Topic Including Biological Literature Improves Homology Search Jeffrey T. Chang, Soumya Rachaudhuri, and Russ B. Altman (Paper Source: http://www.jeffchang.com/)

  3. Problem • Target of bio-sequence study: Annotate the giant sequence information based on accurate homology recognition (ex. Disclose the possible function, relationship of sequences for medical research) • Current approach: Sequence similarity, such as PSI-BLAST • Problem: seq. similarity <> seq. homology

  4. Idea of this paper • How expert in biology solve this problem? • Supplementing sequence similarity with biomedical literature information • Modify PSI-BLAST in each iteration using literature similarity to bound the search of sequences in a sensible scope

  5. Methodology

  6. Methodology • Collect sequence information and literature into a concatenation and remove the so called “stop words” • Calculate document similarity(Wilbur and Yang) A and B are word vectors of two documents. cos(A. B) == 1, similar documents, cos(A, B) == 0, different documents.

  7. Methodology • Construct the word vectors A and B of two documents. A = (a1, a2, a3, …am) B = (b1, b2, b3, …bm) am and bm represent the same attribute(word) total attributes are the union of words of A and B documents

  8. Methodology-validation & test • Superfamily of proteins Over 1000 protein superfamilies, in SCOP(http://scop.berkeley.edu/), proteins in one superfamilies are of same function. Butone protein may cover more than 2 superfamilies. • Gold Standard All proteins just cover one superfamily. All proteins with multiple functions are removed.

  9. Results

  10. Results • Recall: the number of homologous sequences > a fixed e-value cutoff(seq. in Gold Standard retrieved by modified PSI-BLAST)/total number of homologous sequence(Gold standard) • Precision: number of homologous sequences detected/total number of seq. detected(PSI-BLAST reported)

  11. Results

  12. Questions? Thanks!

More Related