1 / 20

Machine Learning & Bioinformatics

Machine Learning & Bioinformatics. Tien-Hao Chang (Darby Chang). PPI. Protein-Protein Interaction. http://www.biomol.de/details/RL/Akt-Signaling-Pathway.jpg. Notes of Akt signaling pathway. Akt is a kinase Kinase act on specific molecules (usually other proteins )

katina
Download Presentation

Machine Learning & Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning & Bioinformatics Tien-Hao Chang (Darby Chang) Machine Learning & Bioinformatics

  2. PPI Protein-Protein Interaction Machine Learning & Bioinformatics

  3. http://www.biomol.de/details/RL/Akt-Signaling-Pathway.jpg

  4. Notes of Akt signaling pathway • Akt is a kinase • Kinase act on specific molecules (usually other proteins) • a type of enzyme, thus a type of protein • Enzyme catalyzes the reaction, but does not change during the reaction (neither reactant nor product) • like a molecule machine/tool • a type of protein • Cytokine carry signals between cells • a type of protein • Protein is a class of molecules with specific chemical structure • such naming strategy is widely adopted such as carbohydrate and lipid Machine Learning & Bioinformatics

  5. Various PPIs • By contact type • physical interaction (complex, transient touch, …) • genetic association (co-functional, co-expressed, …) • By role • co-work • work individually (mutually redundant) • regulate (activate, repress, …) • act on (catalyze, inhibit, …) • participate the same pathway (downstream, upstream, …) Machine Learning & Bioinformatics

  6. Gene? Machine Learning & Bioinformatics

  7. http://www.uic.edu/classes/bios/bios100/lectures/geneticsignaling.jpghttp://www.uic.edu/classes/bios/bios100/lectures/geneticsignaling.jpg

  8. Notes of gene expression • DNA  RNA  protein • DNA is the blueprint, hard to damage thus hard to manipulate • RNA is the transcript, very similar to DNA and more active • protein is the final product • Gene is a DNA sequence that can perform specific functions • usually becomes functional after translating to protein • These terms are sometimes interchangeably • some PPIs are defined by the interactions among the corresponding DNAs/RNAs Machine Learning & Bioinformatics

  9. Experimental Techniques Since there are various PPIs… Machine Learning & Bioinformatics

  10. Shoemaker and Panchenko, 2007

  11. Notes of experimental techniques • (A) yeast two-hybrid (Y2H) detects interactions between proteins X and Y, where X is linked to BD domain which binds to upstream activating sequence (UAS) of a promoter • (B) mass spectroscopy (MS) identifies polypeptide sequence • (C) tandem affinity purification (TAP) purifies protein complexes and removes the molecules of contaminants • (D) gene co-expression analysis produces the correlation matrix where the dark areas show high correlation between expression levels of corresponding genes • (E) protein microarrays (protein chips) can detect interactions between actual proteins rather than genes: target proteins immobilized on the solid support are probed with a fluorescently labeled protein • (F) synthetic lethality method describes the genetic interaction when two individual, nonlethal mutations result in lethality when administered together (a-b-) • (all these are high-throughput) Machine Learning & Bioinformatics

  12. We can “see” the interaction http://www.informaworld.com/ampp/image?path=/713599661/793610806/tfac_a_300921_o_f0001g.png

  13. Computational Approaches What we can do, and will do Machine Learning & Bioinformatics

  14. Shoemaker and Panchenko, 2007

  15. Notes of computational approaches • (A) gene cluster and gene neighborhood methods, different boxes showing different genes • (B) phylogenetic profile method, showing the presence/absence of four proteins in three genomes • (C) Rosetta Stone method • (D) sequence co-evolution method looking for the similarity between two phylogenetic trees/distance matrices • (E) classification methods shown with the example of random forest decision (RFD) method, where five different features/domains are used and each interacting protein pair is encoded as a string of 0, 1 and 2 • the decision trees are constructed based on the training set of interacting protein pairs and decisions are made if proteins under the question interact or not (‘‘yes’’ for interacting, ‘‘no’’ for non-interacting) Machine Learning & Bioinformatics

  16. A small variation this year http://www.grin.com/object/external_document.250856/959745461da5e2c263045729f234e1b6_LARGE.png

  17. Classification approaches • Also called machine learning-based approaches • classification is so-called “supervised learning” • The most critical step is • to encode a protein pair as a vector • (to extract appropriate features) Machine Learning & Bioinformatics

  18. How do you recognize man and woman? http://www.sagennext.com/wp-content/uploads/2010/02/Business-Man-and-Woman1.jpg

  19. Notes of feature encoding • Know the problem (domain knowledge) • You may not know which feature is important (e.g. hair length vs. eyesight) • You may not have the key feature • e.g. no height when given only mug shots • e.g. collecting body fat is much difficult • carefully define the problem and what materials are available • You (usually) may not know the key feature • e.g. suppose that the sex chromosomes are unknown • depicting the mechanism is much important than just predicting • The key features may change (e.g. hair length) • There are always exceptions (e.g. bisexual) Machine Learning & Bioinformatics

  20. Materials that we can support • Biological process • Cellular compartment • DNA sequence • Domain • Expression • Genomic location • No. of references • Molecular function • Orthologous information • Pathway • Protein sequence • TATA box • Transcription boundaries • TF binding • TFBS • TF knockout expression Machine Learning & Bioinformatics

More Related