290 likes | 412 Views
Finding Tribes: Identifying Close-Knit Individuals from Employment Patterns. Lisa Friedland and David Jensen Presented by Nick Mattei. Introduction. Tribes – groups with similar traits in a large graph Distinguish those that work together and move together intentionally.
E N D
Finding Tribes: Identifying Close-Knit Individuals fromEmployment Patterns Lisa Friedland and David Jensen Presented by Nick Mattei
Introduction • Tribes – groups with similar traits in a large graph • Distinguish those that work together and move together intentionally
Relationship Knowledge Discovery • Exploit connections among individuals to identify patterns and make predictions • Discover underlying dependencies • Links must be inferred
Graph Mining • Discover Hidden Group Structures • Animal Herds, Webpages, Employees • Time Series Analysis • Co-integration (Economics) • Security and Intrusion Detection • Dynamic Networks
Motivation • National Association of Securities Dealers • Fraud • Collusion • 4.8 Million Records • 2.5 Million Reps at 560,000 Firms • 100 Years of Data
Complications • Jobs not necessarily in order (or singletons) • 20% of employees hold more than one job at a time • 10% begin multiple jobs (up to 16) on one day • Leave gaps between employment • Mergers and acquisitions
Finding Anomalously Related Entities • Input: • Bipartite Graph: G = (R A, E) • Entities: R = {r1, r2, …, rn} (People) • Attributes: A = {a1, a2, …, am} (Orgs.) • Entities should connect several attributes • Model co-occurrence rates of pairs of attributes
Simple Model Measures • JOBS = (Number of shared Jobs in the sequence) • YEARS = (Number of Years of overlap)
Probabilistic Model • X = P(BrA -> BrB -> BrC -> BrD) • = pa * tAB * tBC * tCD • Estimate: • P(start branch i) • =(#reps ever at i) / (#reps in database) • Tij = P(reps from i to j | #ever at i) • =(#reps leave i to go to j) / (ever at i)
Probabilistic Model • Null Hypothesis of Independent Movement • Movement Not Random • Split and Merge • Markov Chains
Probabilistic Model (Different Paths) • Tij becomes Vij • Vij = P(move to branch j at any point after branch I | currently at i) • = (# reps who go to branch j at any point after working at i) / (# reps ever at i) • Now each vij >= tij and probabilities no longer sum to 1.
Probabilistic Model (Different Paths) • Vij becomes Wij • Wij = P (move to branch j at any point simultaneous to or after branch i | currently at i) • = (# reps who start at j at any point simultaneous or after starting at i) / (# of reps ever at i) • Now less precise in respect to direct transitions but more general
PROB - TIMEBINS • Bins of 1 year or more • 10 people worked at each branch in a bin period • PiX = # reps ever at i during time X / # reps in DB • yiXjY = # reps ever at I during time X and at j during time Y, where Y >= X / # reps ever at i during time X
PROB-NOTIME • Ignores order of job moves • Use original pi • Zij = raw number of reps who are at both branches I and j during career • Transition Pr from i to j: • = (zij / # reps ever at i) • != (zij / # reps ever at j) • =transition Pr from j to i
Discussion • JOBS, PROB, PROB-TIME, PROB-NOTIME create tribes with higher than average disclosure scores • PROB creates more cross zip code results • PROB-TIME has higher phi-squared than all others • PROB favors large firms
Discussion • JOBS and YEARS compute larger connected components • JOBS and PROB find same number of tribes but pick different groups as tribes
Conclusions • With no explicit knowledge we can discover: • Job transitions • Geography • Career track
Conclusions • Needed: • Ongoing process • Multiple affiliations • Arbitrary times • Time is a paradox in domain
Thanks! • Time for: • Questions • Comments • Smart Remarks