1 / 10

Estimating variable structure and dependence in multi-task learning via gradients

Estimating variable structure and dependence in multi-task learning via gradients. By: Justin Guinney, Qiang Wu and Sayan Mukherjee Presented by: John Paisley. Outline. Outline of presentation General problem Review of single-task solution Extension to multi-task Experiments.

senta
Download Presentation

Estimating variable structure and dependence in multi-task learning via gradients

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating variable structure and dependence in multi-task learning via gradients By: Justin Guinney, Qiang Wu and Sayan Mukherjee Presented by: John Paisley

  2. Outline • Outline of presentation • General problem • Review of single-task solution • Extension to multi-task • Experiments

  3. General problem • Have small number of high dimensional data, x, with corresponding response variable, y (fully supervised) • Want to simultaneously build a classification or regression function and learn important features, as well as correlation between features (to know if two features are important in the same way) • Xuejun presented their single-task solution. This paper extends this to the multi-task setting.

  4. Single-Task Solution (classification) • By Taylor expansion, estimate the classification function as • Seek to minimize expected error • Where is a weight function and ? And phi is a convex loss function. • To solve this, regularize in RKHS

  5. Single-Task (regression) • Use the response variable for each input and only learn the gradient.

  6. Single-Task (solution and value of interest) • By representer theorem, this has solution of the form • The gradient outer product (GOP) is the matrix with all feature information. This is approximated as This paper Xuejun’s paper Matlab

  7. GOP • This matrix is central to their paper because it tells all the information about the importance of each feature. The diagonal can be used to rank each feature’s importance and the off diagonal tells how features are correlated (therefore if two features are important in the same way, only one need be selected). • My confusion: • I take this to mean that which would resolve previous page • However, constructing a discrete Gaussian kernel in Matlab, this isn’t true (and makes no sense to me why it should be true).

  8. Extension to multi-task • Very logical extension. They assume a base function and have a task-specific correction. • Classification RKHS regularization • Regression RKHS regularization

  9. Experiments:

More Related