410 likes | 697 Views
An Overview of Domain-Driven Data Mining: Toward Actionable Knowledge Discovery (AKD). Longbing Cao Faculty of Engineering and Information Technology University of Technology, Sydney, Australia. Outline. Why Do We Need D 3 M What Is D 3 M The D 3 M Framework
E N D
An Overview ofDomain-Driven Data Mining: Toward Actionable Knowledge Discovery(AKD) Longbing Cao Faculty of Engineering and Information Technology University of Technology, Sydney, Australia
Outline • Why Do We Need D3M • What Is D3M • The D3M Framework • D3M Theoretical Underpinnings • D3M Research Issues • D3M Applications • D3M References Cao, L: D3M at DDDM2008 Joint with ICDM2008
Why Do We Need D3M • A common scenario in deploying data mining algorithms • I find something interesting! • “Many patterns are found”, • “They satisfy technical metric threshold well” • What do business people say? • “So what?” • “They are just commonsense” • “I don’t care about them” • “I don’t understand them” • “How can I use them?” • “Am I wrong? What can I do better for my business mate?” Cao, L: D3M at DDDM2008 Joint with ICDM2008
Why Do We Need D3M • Where is something wrong? • Gap: • academic objectives || business goals • Technical outputs || business expectation • macro-level methodological and fundamental issues • Academic: technical interest; innovative algorithms & patterns • Practitioner: social, environmental, organizational factors and impact; getting a problem solved properly • micro-level technical and engineering issues • System dynamics, system environment, and interaction in a system • Business processes, organizational factors, and constraints • Human and domain knowledge involvement Cao, L: D3M at DDDM2008 Joint with ICDM2008
An example: Problem with association mining • Existing association rule mining algorithms are specifically designed to find strong patterns that have high predictive accuracy or correlation; • While frequent patterns are referred to as commonsense knowledge, they can be eager to discover new and hidden patterns in databases. • Many patterns are found; • How associations can be taken over by business people seamlessly and into operationalizable actions accordingly? Cao, L: D3M at DDDM2008 Joint with ICDM2008
What Is D3M • Next-generation data mining methodologies, frameworks, algorithms, evaluation systems, tools and decision support, • Cater for business environment • Satisfy business needs • Deliver business-friendly and decision-making rules and actions that are of solid technical and business significance • Can be understood & taken over by business people to make decision aim to promote the paradigm shift from data-centered hidden pattern mining to domain-driven actionable knowledge discovery (AKD) Cao, L: D3M at DDDM2008 Joint with ICDM2008
Involve and synthesize Ubiquitous Intelligence • human intelligence, • domain intelligence, • data intelligence, • network intelligence, • organizational and social intelligence, and • meta-synthesis of the above ubiquitous intelligence Cao, L: D3M at DDDM2008 Joint with ICDM2008
The D3M Framework • AKD-based problem-solving Cao, L: D3M at DDDM2008 Joint with ICDM2008
Interestingness & actionability Cao, L: D3M at DDDM2008 Joint with ICDM2008
Conflicts & tradeoff Cao, L: D3M at DDDM2008 Joint with ICDM2008
A framework for AKD • Post-analysis-based AKD Cao, L: D3M at DDDM2008 Joint with ICDM2008
D3M Theoretical Underpinnings • artificial intelligence and intelligent systems, • behavior informatics and analytics, • business modeling, • business process management, • cognitive sciences, • data integration, • human-machine interaction, • human-centered computing, • knowledge representation and management, • machine learning, • ontological engineering, • organizational and social computing, • project management methodology, • social network analysis, • statistics, • system simulation, and so on. Cao, L: D3M at DDDM2008 Joint with ICDM2008
D3M Research Issues • Data Intelligence: • deep knowledge in complex data structure; mining in-depth data patterns, and mining structured & informative knowledge in complex data • Domain Intelligence: • Domain & prior knowledge, business processes/logics/workflow, constraints, and business interestingness; representation, modeling and involvement of them in KDD • Network Intelligence: • network-based data, knowledge, communities and resources; information retrieval, text mining, web mining, semantic web, ontological engineering techniques, and web knowledge management • Human Intelligence: • empirical and implicit knowledge, expert knowledge and thoughts, group/collective intelligence; human-machine interaction, representation and involvement of human intelligence • Social Intelligence: • organizational/social factors, laws/policies/protocols, trust/utility/benefit-cost; collective intelligence, social network analysis, and social cognition interaction • Intelligence metasynthesis: • Synthesize ubiquitous intelligence in KDD; metasynthetic interaction (m-interaction) as working mechanism, and metasynthetic space (m-space) as an AKD-based problem-solving system Cao, L: D3M at DDDM2008 Joint with ICDM2008
How to reach an interest tradeoff • Balance between technical and business interests • Suppose there are multiple metrics for each aspect Cao, L: D3M at DDDM2008 Joint with ICDM2008
actionable knowledge discovery through m-spaces • acquiring and representing unstructured, ill-structured and uncertain domain/human knowledge • supporting dynamic involvement of business experts and their knowledge/intelligence • acquiring and representing expert thinking such as imaginary thinking and creative thinking in group heuristic discussions during KDD modeling • acquiring and representing group/collective interaction behavior and impact emergence • Building infrastructure supporting the involvement and synthesis of ubiquitous intelligence Cao, L: D3M at DDDM2008 Joint with ICDM2008
D3M Applications • Real-world data mining • Our recent case studies • Capital markets • actionable trading agents • actionable trading strategies • Social security • activity mining • combined mining Cao, L: D3M at DDDM2008 Joint with ICDM2008
Actionable Trading Evidence for Brokerage Firms • Trading strategy/evidence • Actionable trading evidence Cao, L: D3M at DDDM2008 Joint with ICDM2008
Domain factors Cao, L: D3M at DDDM2008 Joint with ICDM2008
Business interest Cao, L: D3M at DDDM2008 Joint with ICDM2008
Developing in-depth trading strategy Cao, L: D3M at DDDM2008 Joint with ICDM2008
Activity mining for Australian Commonwealth Governmental Debt Prevention • Impact-targeted activity mining Cao, L: D3M at DDDM2008 Joint with ICDM2008
Impact-targeted activity mining • Frequent impact-targeted activity sequences • Impact-contrasted activity sequences • Impact-reversed activity sequences • Impact-targeted combined association clusters Cao, L: D3M at DDDM2008 Joint with ICDM2008
Data intelligence • Activity data • Itemset imbalance • Impact imbalance • Seasonal effect • Demographic data • Transactional data Itemset/tuple selection/construction Cao, L: D3M at DDDM2008 Joint with ICDM2008
Domain intelligence • Business process/event for activity selection • Domain knowledge • Feature selection • Sequence construction • Impact target • Positive impact • Negative impact • Multi-level impacts • Feature/attribute selection • Interestingness definition • New pattern structures Cao, L: D3M at DDDM2008 Joint with ICDM2008
Organizational/social factors • Operational/intervention activities • Seasonal business requirement/ interaction changes • Business cost (debt amount/duration) • Business benefit (saving/preventing debt amount or reducing debt duration) • Deliverable format Cao, L: D3M at DDDM2008 Joint with ICDM2008
Impact-reserved pattern pair • Underlying pattern 1: • Derivative pattern 2: • Impact-targeted combined association clusters Cao, L: D3M at DDDM2008 Joint with ICDM2008
Conditional impact ratio (Cir) • Conditional Piatetsky-Shapiro’s (P-S) ratio (Cps) Cao, L: D3M at DDDM2008 Joint with ICDM2008
Interestingness: tech & biz Cao, L: D3M at DDDM2008 Joint with ICDM2008
The process Cao, L: D3M at DDDM2008 Joint with ICDM2008
Impact-reversed sequential activity patterns Cao, L: D3M at DDDM2008 Joint with ICDM2008
Demographic + transactional combined pattern Cao, L: D3M at DDDM2008 Joint with ICDM2008
D3M References Books: • Cao, L. Yu, P.S., Zhang, C., Zhao, Y. Domain Driven Data Mining, Springer, 2009. • Cao, L. Yu, P.S., Zhang, C., Zhang, H.(ed.) Data Mining for Business Applications, Springer, 2008. Workshops: • Domain-driven data mining 2008, joint with ICDM2008. • Domain-driven data mining 2007, joint with SIGKDD2007. Special issues: • Domain-driven data mining, IEEE Trans. Knowledge and Data Engineering, 2009. • Domain-driven, actionable knowledge discovery, IEEE Intelligent Systems, Department, 22(4): 78-89, 2007. Some of relevant papers: • Longbing Cao, Yanchang Zhao, Huaifeng Zhang, Dan Luo, Chengqi Zhang. Flexible Frameworks for Actionable Knowledge Discovery, submitted to IEEE Trans. on Knowledge and Data Engineering. • Cao, L., Zhang, H., Zhao, Y., Zhang, C. Combined Mining: Discovering More Informative Knowledge in e-Government Services, submitted to ACM TKDD, 2008. • Cao, L., Dai, R., Zhou, M.: Metasynthesis, M-Space and M-Interaction for Open Complex Giant Systems, technical report, 2008. • Cao, L. and Ou, Y. Market Microstructure Patterns Powering Trading and Surveillance Agents. Journal of Universal Computer Sciences, 2008 (to appear). • Cao, L. and He, T. Developing actionable trading agents, Knowledge and Information Systems: An International Journal, 2008. • Cao, L. Developing Actionable Trading Strategies, in edited book: Intelligent Agents in the Evolution of WEB and Applications, Springer, 2008. Cao, L: D3M at DDDM2008 Joint with ICDM2008
Some of relevant papers: • Cao, L., Zhao, Y., Zhang, C. (2008), Mining Impact-Targeted Activity Patterns in Imbalanced Data, IEEE Trans. Knowledge and Data Engineering, IEEE, , Vol. 20, No. 8, pp. 1053-1066, 2008. • Cao, L., Yu, P., Zhang, C., Zhao, Y., Williams, G.:DDDM2007: Domain Driven Data Mining, ACM SIGKDD Explorations Newsletter, 9(2): 84-86, 2007. • Cao, L., Zhang, C.: Knowledge Actionability: Satisfying Technical and Business Interestingness, International Journal of Business Intelligence and Data Mining, 2(4): 496-514, 2007. • Cao, L., Zhang, C.: The Evolution of KDD: Towards Domain-Driven Data Mining, International Journal of Pattern Recognition and Artificial Intelligence, 21(4): 677-692, 2007. • Cao, L.: Domain-Driven Actionable Knowledge Discovery, IEEE Intelligent Systems, 22(4): 78-89, 2007. • Cao, L., and Zhang, C. Domain-driven data mining: A practical methodology, International Journal of Data Warehousing and Mining (IJDWM), IGI Global, 2(4):49-65, 2006. Cao, L: D3M at DDDM2008 Joint with ICDM2008
Thank you! Longbing CAO Faculty of Engineering and IT University of Technology, Sydney, Australia Tel: 61-2-9514 4477 Fax: 61-2-9514 1807 email: lbcao@it.uts.edu.au Homepage: www-staff.it.uts.edu.au/~lbcao/ The Smart Lab: datamining.it.uts.edu.au Cao, L: D3M at DDDM2008 Joint with ICDM2008