190 likes | 213 Views
Adapting and Visualizing Association Rule Mining Systems. for Law Enforcement Purposes. W. Kosters. Research Area. Area Criminal Career Goals Practice Structure Difficulties Algorithm Results Problems. Computer Science. Sociology. Criminal Career Study. Psychology. Criminology.
E N D
Adapting and Visualizing Association RuleMining Systems for Law Enforcement Purposes T.K. Cocx, tcocx@liacs.nl W. Kosters
Research Area • Area • Criminal Career • Goals • Practice • Structure • Difficulties • Algorithm • Results • Problems Computer Science Sociology Criminal Career Study Psychology Criminology Law T.K. Cocx, tcocx@liacs.nl
Criminal Careers T.K. Cocx, tcocx@liacs.nl
Analysis Analysis Goal T.K. Cocx, tcocx@liacs.nl
Different angles to solution • Analyze criminal records predict career OR • Understand crime better by (automatically) analyzing its characteristics. • Both solutions work on same database • National crime record database • Done by de Bruin et al. (ICDM 2006) • Focus of this talk T.K. Cocx, tcocx@liacs.nl
Focus • Database contains both crime AND demographic data • Relations between occurrences of certain crimes within individual careers • Teaches crimes that predict others in a more general case • Alarms • Relations between crimes and demographic data within individual careers • Discovers problematic ‘configurations’ within demographic areas • Better deploy police work forces T.K. Cocx, tcocx@liacs.nl
Approach • Employ association rule mining: • Use standard Apriori methods • ‘Common’ overpresent (statistically) • Find common subsets of attributes • A subset can only be common if all its subgroups are common as well • Create tree containing all rules • Visualize tree to police end user • Is data suitable for this method? T.K. Cocx, tcocx@liacs.nl
Problems • Data is not boolean. • Nature of database (or crime itself) is responsible for a large number of over- or under-present attributes. • Male (80%) • Dutch (90%) • Addiction indication (4%) • Inherit relation between certain attributes pollutes outcome. • Semi-aggregated attributes (first age / last age for one-timers) • Descend / born T.K. Cocx, tcocx@liacs.nl
Approach (Cont.) • Fivethold method • Database refit • Attribute ban • Semantic Split • Interestingness • Tree visualization T.K. Cocx, tcocx@liacs.nl
Database refit • Standard methods rely on boolean databases • Attributes are present, or not. • Numerical attributes are discretized • Age 10 year intervals • Nominal attributes are split into all available categories • Resulting database is boolean • Very large (all different etniticities) • Very sparse (only one of them true) T.K. Cocx, tcocx@liacs.nl
Attribute ban • The database consists of many over-present attributes • Often lacking descriptive value (shopping bags) • Example: deceased • To cope with these, analysts can handpick disruptive attributes • These are left out when searching T.K. Cocx, tcocx@liacs.nl
Semantic Split • The criminal record database has two parts that are clearly semantically different • Demographic data (also known for non criminals) • Crime data • They are strictly separated number-wise. • Analyst can handpick an x that states the beginning of the second halve. • 1:N / N:1 • Lower and upper halve • Pick 1 item maximum from the lower halve. • For example: born / ethniticity. T.K. Cocx, tcocx@liacs.nl
Subset Search • Support: number of occurrences (set to true) of an attribute. • When support reaches threshold it is considered common. • Not commonness one wants but interestingness • Female • Confidence: conditional probability of a certain itemset given another itemset. • If a certain itemset that is ‘interesting’ implies another: the combination is also interesting T.K. Cocx, tcocx@liacs.nl
Subset Search (Cont.) • Confidence can be seen as: • max(C(a,b),C(b,a)) • avg(C(a,b),C(b,a)) • The latter is stronger because it demands an implication in both directions. • An itemset will certainly be interesting if its occurence is much higher than one would expect based upon the occurence of its individual member-itemsets: • Lift • The relation between expected occurence and actual occurrence. T.K. Cocx, tcocx@liacs.nl
Visualization • All interesting item sets are put in a giant tree that is represented to the user • In this tree each node is part of an interesting subset of the database with all its parents. • NOT siblings 10 5 Interesting sets: 10 – 1 10 – 1 – 2 10 – 4 5 – 7 1 4 7 2 T.K. Cocx, tcocx@liacs.nl
Visualization (Cont.) • It is common practice to represent the tree with all its member subsets • Inpractical, especially for police analyst • Use known paradigm • Resulting dataset T.K. Cocx, tcocx@liacs.nl
Some notable results • Joyriding ↔ Violation of Work Circumstances ↔ Alcohol Addiction • Drug Smuggling ↔ Drug Addiction • Manslaughter ↔ Discrimination • Male ↔ Theft with Violence ↔ Possession (of weapon) • Female ↔ Drug Abuse • African Descend↔ Public Safety • Rural Areas ↔ Traffic Felonies T.K. Cocx, tcocx@liacs.nl
Conclusion & Future work • The nature of the criminal record database needs a custom set of specific solutions • Attribute ban, semantic split and visualization contribute largely to results from performed queries. • Semantic bond between single attributes • Search for most uncommon itemsets • Inherently uncommon couples. (semantic bond) • Comparison with social sciences T.K. Cocx, tcocx@liacs.nl
Interrogation T.K. Cocx, tcocx@liacs.nl