1 / 20

Learning Relational Probability Trees

Learning Relational Probability Trees. Jennifer Neville David Jensen Lisa Friedland Michael Hay. Presented by Andrew Tjang. About the authors. David Jensen – Director of KDL, research asst. prof @ umass

ranae
Download Presentation

Learning Relational Probability Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Relational Probability Trees Jennifer Neville David Jensen Lisa Friedland Michael Hay Presented by Andrew Tjang

  2. About the authors • David Jensen – Director of KDL, research asst. prof @ umass • Research focuses on the statistical aspects and architecture of data mining systems and the evaluation of those systems for government and business applications

  3. Authors, cont’d • Lisa Friedland – graduate student, KDL, interest in DB/data mining • Michael Hay – graduate student, KDL • Jennifer Neville – • masters student @ Umass in Knowledge Discovery Lab (KDL investigates how to find useful patterns in large and complex databases.)

  4. Learning Relational Probability Trees (RPTs) • Classification trees used in data mining: easily interpretable • Conventional: homogeneous data, statistically independent • Much research gone into classification trees -> relational data, so variances can be safely “ignored” and relational effects can be on the front burner

  5. Why is relational data so complicated? • Traditional assumption of instance independence is violated. • 3 characteristics of relational data • Concentrated linkage • Degree disparity • Relational autocorrelation • All 3 potentially produce overly complex models with excess structure • This leads to factually incorrect relations and more space/computation requirements.

  6. What did they do? • Conventional techniques/modeling assume independence • RPT and class of algorithms to adjust for relational characteristics. RPTs the only statistical model to adjust for these relational characteristics • Applied these algorithms to a set of databases (IMDb, Cora, WebKB, human genome) • Then evaluated accuracies of their RPT applications

  7. Examples of relational data • Social networks, genomic data, data on interrelated people, places, things/events from text documents. • First part was to apply RPT to WebKB • (a collection of websites classified into categories – course, faculty staff, student, research project) • Appx 4000 pages, 8000 hyperlinks • DB contains attributes associated with each page/hyperlink (path/domain, direction of hyperlink)

  8. Probability (sub) tree of WebKB

  9. RBTs • Estimate probability distros over att values • These algs differ from regular – they consider the effect of related object • Movie example • Use Aggregation used to “proportionalize” relational data. • E.g. some movies have 10 actors, some have 1000 • Aggregate as avg age. (either preprocess and or dynamically during learning process

  10. The Algorithm • RPT algorithm takes collection of subgraphs as input (single target object ot be classified + other objects/links in relat’l neighborhood • Construct a probability estimation tree (like previous fig) to predict target class label given: • The atts of target object • The atts of other objects in neighborhood • The degree atts counting object and neighborhood objects

  11. Algorithm (cont’d) • Searches over space of binary relat’l features to “split data” • MODE(actor.gender)=female • Constructs features based on atts of different objects in subgraphs and aggregation of values. • Feature scores calculated from class after split using chi squared analysis

  12. Algorithm Part trois • The p-value from the chi squared will reveal which features are significantly correlated. The max of these is included in tree • Does this alg recursively greedily until class distros don’t change significantly • “Bonferroni adjustment” + Laplace correction to improve probability estimates

  13. Relational features • Traditional features • Genre=comedy • ID att, operator, value • Relational features • Movie(x),Y = {y|ActedIn(x,y)}:Max(Age(Y))>65 • ID relation that links an object x to a set of other object Y.

  14. Feature Selection Biases • Concentrated linkage and autocorrelation combine to reduce effective sample size and increase variance. • Degree disparity can cause sprious elements and miss useful elements.

  15. Randomization Test • Used on relational data sets where hypothesis testing not as useful. • Used to account for biases • Computationally intensive • Pseudo samples generated by permuting variables • In this case permute att values before aggregation • This new permuted vector preserves intrinsic links/relations but removes correlations btw atts and class label

  16. Tests • Use IMDb to predict $2million box office receipts on opening weekend • 1) IMDb with class labels only correlated with degree features. • 2)Use structure of IMDb as well as attributes • 3)From Cora : if paper is of topic “neural networks” • 4) Genome project: if gene is in nucleus • 5) WebKB: is student web page

  17. Tests (cont’d) • Each of these tasks a set of classification techniques were applied: • RPT w/ Chi squared (CT) • RPT with randomized testing (RT) • RPT (pseudo) w/o randomized testing (C4.5) • Baseline Relat’l Bayesian Classifier (RBC) • * entries statistically significant differences

  18. Results

  19. Results explained • None except for random showed that their algorithm performed any better than other algorithms • For random they show: • RPTs can adjust for linkage and autocorrelation. • RTs perfrom better than other 3 models • Biased models select random attributes that hinder performance.

  20. Conclusions (their’s and mine) • Possible to extend conventional methods for estimation tree algorithms • RPT w/randomization and RPT w/o randomization perform about the same (smaller trees in the former) • Randomization adjusts for biases (but is this so important if they perform the same?) • Non selective don’t have biases, but lose interpretablilty. • Future work enrichments to RPT • Better ways of calc’ing ranking scores

More Related