1 / 21

Butte Lab Journal Club

Butte Lab Journal Club. 10/25/2010. Boltzmann machines able to solve difficult combinatorial problems Estimating the density function of multivariate binary data typically done with mixture models or factor models

carlow
Download Presentation

Butte Lab Journal Club

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Butte Lab Journal Club 10/25/2010

  2. Boltzmann machines able to solve difficult combinatorial problems • Estimating the density function of multivariate binary data typically done with mixture models or factor models • Problem: Too computationally expensive for many multivariate binary density modeling problems • Solution: Authors describe a generalization of the restricted Boltzmann Machine (RBM), the restricted Boltzmann forest (RBForest) • replaces the binary hidden variables of the RBM with groups of tree-structured binary variables • when the size of the trees is varied, the number of parameters of the model can be increased while keeping the computations of the density function tractable. • basically, “structured” binning of variables • Example application: automated diagnosis using involving large number of feature types

  3. Computational pipelines are essential, yet paucity of “good” tools for designing pipelines • eHive has many design features for robustness and scalability: • Fault tolerance • Agents (“bees”) • Graph-based • Cloud/GRID-friendly • Generic infrastructure: PERL, MySQL

  4. vorinostat • Normalization scheme enables better detection of drug signals • Less susceptible to known confounders Asthma drugs trichostatin A antifungal drugs Calmodulin inhibitors Anti-neoplastic drugs

  5. Emtree = EMBASE’s MeSH equivalent; much more comprehensive in certain areas, e.g., pharmacology Caveat: SCOPUS is not EMBASE  SCOPUS does not support the kinds of complex Emtree queries EMBASE supports, as well as other features e.g., no thesaurus explosion in SCOPUS

  6. CenterWatch Databases

  7. Example reports…

  8. Example Pipeline for Multiplying Large Numbers • Pipeline defined in 4 files: • Start.pm splits a multiplication job into sub-tasks and creates corresponding jobs • PartMultiply.pm performs a partial multiplication and stores the intermediate result in a table • AddTogether.pm waits for partial multiplication results to compute and adds them together into final result • LongMult_conf.pm, the pipeline configuration module that links the previous Runnables into one pipeline

  9. Features Used in Example Pipeline • A pipeline can have multiple analyses (e.g.,'start', 'part_multiply' and 'add_together'). • A job of one analysis can create jobs of other analyses by 'flowing the data' down branches. These branches are then assigned specific analysis names in the pipeline configuration file • one 'start' job flows partial multiplication subtasks down to branch #2, and a task of adding them together down branch #1. • Execution of one analysis can be blocked until all jobs of another analysis have been successfully completed ('add_together' is blocked both by 'part_multiply'). • eHive processes store intermediate and final results in a database (in this pipeline, 'intermediate_result' and 'final_result' tables are used).

  10. Other Worthy Features • eHive performance good for jobs that run for very short time but repeated millions of time • Converse of typical job scheduling systems, which have high latency

More Related