350 likes | 470 Views
Process Mining An index to the state of the art and an outline of open research challenges at DIIAG. Claudio Di Ciccio , Massimo Mecella. Seminars in Software and Services for the Information Society Rome, 2012, May the 7 th. Process Mining. Definition.
E N D
Process MiningAn index to the state of the art and an outline of open research challenges at DIIAG Claudio Di Ciccio, Massimo Mecella Seminars in Software and Services for the Information Society Rome, 2012, May the 7th
Process Mining Definition • Process Mining [Aalst2011.book], also referred to as Workflow Mining, is the set of techniques that allow the extraction of process descriptions, stemming from a set of recorded real executions (logs). • ProM [AalstEtAl2009] is one of the most used plug-in based software environment for implementing workflow mining (and more) techniques. • The new version 6.0 is available for download at www.processmining.org Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Process Mining Definition • Process Mining involves: • Process discovery • Control flow mining, organizational mining, decision mining; • workflow • Conformance checking • Operational support • We will focus on the control flow mining • Many control flow mining algorithms proposed • α [AalstEtAl2004] and α++ [WenEtAl2007] • Fuzzy [GüntherAalst2007] • Heuristic [WeijtersEtAl2001] • Genetic [MedeirosEtAl2007] • Two-step [AalstEtAl2010] • … Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Process Mining Further reading • The rest of the lesson is based on the following material: • Van der Aalst, W. M. P.: “Process Discovery: An Introduction” • Available athttp://www.processmining.org/_media/processminingbook/process_mining_chapter_05_process_discovery.pdf • From the teaching material for [Aalst2011.book] • De Medeiros, A. K. A.: “Process Mining: Control-Flow Mining Algorithms” • Available athttp://www.processmining.org/_media/courses/processmining/lecture3_controlflowminingalgorithms.ppt Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
A different context (1) Artful processes and knowledge workers • Artful processes [HillEtAl06] • informal processes typically carried out by those people whose work is mental rather than physical (managers, professors, researchers, engineers, etc.) • “knowledge workers”[ACTIVE09] • Knowledge workers create artful processes “on the fly” • Though artful processes are frequently repeated, they are not exactly reproducible, even by their originators, nor can they be easily shared. Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
A different context (2) Email conversations • In collaborative contexts, knowledge workers share their information and outcomes with other knowledge workers • E.g., a software development mgr. • Typically, by means of several email conversations • Email conversations are actual traces of running processes that knowledge workers adhere to Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
A different context (3) Processes from email conversations • From the collection of email messages, you can extract the processes that lay behind • Related e-mail conversations are traces of their runs • Valuable advantages for users • Automated discovery of formal representations • with no effort for knowledge workers • Tidy organization for naïve best practices kept only in mind • Opportunity to share and compare the knowledge on methodologies • Automated discovery of bottlenecks, delays, structural defects • from the analysis of previous runs • Email conversations are a kind of semi-structured text • this approach is not tailored to the electronic mail • it can be extended to the analysis of other semi-structured texts Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
A different context (4) Some areas of applicability • Personal information management (PIM) • how to organize one’s own activities, contacts, etc. through the usage of software • Information warfare • in supporting anti-crime intelligence agencies • Enterprise engineering • for knowledge-heavy industries, where preserving documents making up product data is not enough • eHealth • for the automatic discovery of medical treatment procedures on top of patient health records Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
MailOfMine What is MailOfMine? • MailOfMineis the approach and the implementation of a collection of techniques, the aim of which is to is to automatically build, on top of a collection of email messages, a set of workflow models that represent the artful processes laying behind the knowledge workers’ activities. • [DiCiccioEtAl11] • [DiCiccioMecella12] • [DiCiccioMecella/TR12] Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes The imperative model • Represents the whole process at once • The most used notation is based on a subclass of Petri Nets (namely, the Workflow Nets) Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes If A is performed, B must be perfomed, no matter before or afterwards (responded existence) The declarative model • Rather thanusing a procedural language for expressing the allowed sequence of activities, it is based on the description of workflows through the usage of constraints • the idea is that every task can be performed, except the ones which do not respect such constraints • this technique fits with processes that are highly flexible and subject to changes, such as artful processes Whenever B is performed, C must be performed afterwards and B can not be repeated until C is done (alternate response) The notation here is based on [AalstEtAl06,MaggiEtAl11] (DecSerFlow, Declare) Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes Imperative vS declarative Declarative Declarative models work better in presence of a partial specification of the process scheme Imperative Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Declare constraint templates Existence templates Existence(n, A)Activity A occurs at least n times in the process instanceBCAAC✓ BCAAAC✓ BCAC✗ (for n = 2) Absence(A)Activity A does not occur in the process instance BCC✓ BCAC✗ Absence(n+1, A)Activity A occurs at most n+1 times in the process instance BCAAC✗ BCAC✓ BCC✓ (for n = 2) Exactly(n, A)Activity A occurs exactly n times in the process instance BCAAC✗ BCAAAC✗ BCAC✗ (for n = 2) Init(A)Activity A is the first to occur in each process instance BCAAC✗ ACAAAC ✓BCC✗ Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Declare constraint templates Relation templates RespondedExistence(A, B)If A occurs in the process instance, then B occurs as wellCAC✗ CAACB✓ BCAC✓ BCC✓ Response(A, B)If A occurs in the process instance, then B occurs after ABCAAC✗ CAACB✓ CAC✗ BCC✓ AlternateResponse(A, B)Each time A occurs in the process instance, then B occurs afterwards, before A recurs BCAAC✗ CAACB✗ CACB✓ CABCA✗ BCC✓ CACBBAB✓ ChainResponse(A, B)Each time A occurs in the process instance, then B occurs immediately afterwards BCAAC✗ BCAABC✗ BCABABC✓ Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Declare constraint templates Relation templates RespondedExistence(B, A)If B occurs in the process instance, then A occurs as wellCAC✓ CAACB✓ BCAC✓ BCC✗ Precedence(A, B)B occurs in the process instance only if preceded by ABCAAC✗ CAACB✓ CAC✓ BCC✓ AlternatePrecedence(A, B)Each time B occurs in the process instance, it is preceded by A and no other B can recur in between BCAAC✗ CAACB✓ CACB✓ CABCA✓ BCC✗ CACBAB✓ ChainPrecedence(A, B)Each time B occurs in the process instance, then B occurs immediately beforehand BCAAC✗ BCAABC✗ CABABCA✓ Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Declare constraint templates Relation templates CoExistence(A, B)If B occurs in the process instance, then A occurs, and viceversaCAC✗ CAACB✓ BCAC✓ BCC✗ Succession(A, B)A occurs if and only if it is followed by B in the process instanceBCAAC✗ CAACB✓ CAC✗ BCC✗ AlternateSuccession(A, B)A and B occur in the process instance if and only if the latter follows the former, and they alternate each other in the trace BCAAC✗ CAACB✗ CACB✓ CABCA✗ BCC✗ CACBAB✓ ChainSuccession(A, B)A and B occur in the process instance if and only if the latter immediately follows the former BCAAC✗ BCAABC✗ CABABC✓ Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Declare constraint templates Negative relation templates NotCoExistence(A, B)A and B never occur together in the process instanceCAC✓ CAACB✗ BCAC✗ BCC✓ NotSuccession(A, B)A can never occur before B in the process instanceBCAAC✓ CAACB✗ CAC✓ BCC✓ NotChainSuccession(A, B)A and B occur in the process instance if and only if the latter does not immediately follows the former BCAAC✓ BCAABC✗ CBACBA✓ Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Relation constraint templates subsumption Constraint templates are not independent of each other Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Relation constraint templates subsumption Constraint templates are not independent of each other • E.g., • A trace like ABABCABCC satisfies (w.r.t. A and B): • RespondedExistence(A, B), RespondedExistence(B, A),CoExistence(A, B), CoExistence(B, A), Response(A, B), AlternateResponse(A, B), ChainResponse(A, B), Precedence(A, B), AlternatePrecedence(A, B), ChainPrecedence(A, B),Succession(A, B), AlternateSuccession(A, B),ChainSuccession(A, B) • The mining algorithm would show the most strict constraint only (ChainSuccession(A, B)) • MINERful,the mining algorithm of MailOfMine, faces this unresolved issue Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
MINERful The declarative workflow mining algorithm of MailOfMine • Key idea: building a knowledge base with local and global statistics on the mutual order of appearance of events for further fast querying • Performances: the algorithm is proven to be fast (over 12m events processed in less than 170 secs.) • Asymptotically: • linear in the number of the traces • quadratic in the number of events per trace • i.e., polynomial in the input size • linear in the number of constraint templates • See [DiCiccioMecella/TR12] for further reading Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
LTL semantics • ConDec, DecSerFlow and Declare adopt Linear-time Temporal Logic (LTL) for expressing the semantics of the constraint templates. • See [AalstEtAl06,MaggiEtAl11] for further reading • Van der Aalst, W. M. P. : “Auditing 2.0 Using Process Mining to Support Tomorrow's Auditor” • Available athttp://www.processmining.org/_media/presentations/process-mining-and-auditing-siks-course-2010-wvda.pdf • See slides 61-67 Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the representation of artful process schemata Regular grammars expressing declarative workflows • In MailOfMine, each constraint in the set which can be used to define an artful mined process is expressible through regular grammars, where: • activities are terminal characters, building blocks of constraints on tasks; • constraints are regular expressions, equivalent to regular grammars; • the process scheme is the intersection of constraints defined on top of activities. • The process scheme defines a Process Describing Grammar (PDG) Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the usage of regular grammars The rationale: why not LTL for declarative workflows? • Temporal logic is a formalism for describing sequences of transitions between states in a reactive system • Linear Temporal Logic (LTL, [Pnueli77]) describes events along a single computation path • LTL formulæ are verified over semi-infinite runs • defined over Kripke structures • They are good for automatically checking the correct work of circuits or server programs • Not for human processes • which have both a starting point and an end • “In the long run, we are all dead’' (John Maynard Keynes) • Regular grammars are verified by Finite State Automata • working with less complex algorithms, in terms of computational effort • A PDG describes the language spoken by collaborative organisms in terms of activities Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the usage of regular grammars Constraint templates as regular expressions Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes An example of DecSerFlow [VanDerAalstEtAl06] notation • You might want to run a legal trace like this: • 〈a3, a3, a3, a2, a2, a3, a4, a5, a6, a7, a6, a5 〉 • What we want to state here is that such a notation is probably not quite intuitive You could even start from here No, it is not the initial action Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes Our proposal • We do not consider a static graph-based global representation alone the best suitable solution. • A graphical representation, easy to understand at a first glimpse, must be used. • Idea: • when presenting the process schema (static view): • a local view on tasks/activities, showing related constraints only; • a global view on the process, either: • basic (less information, less symbols), or • extended (more information, more symbols, extending (a)); • (2) can work as a kind of navigation map for (1) • when presenting the running instance (dynamic view): • a dynamic interactive trace representation diagram, based on the local static view notation. • See [DiCiccioEtAl2011] for further reading Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of processes Introducing the new local view: the rationale Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the visualization of constraints The static local view: some examples Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the representation of processes The static global view Basic Extended Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
A GUI sketch Local and global views together Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
On the representation of constraints Dynamic view Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
Open challenges Your contribution is welcome • Event frequency handling in MINERful • Error injection and robustness testing/improving • Auto-thresholding • Definition of a basis for declarative processes • Graphical model for declarative processes in MailOfMine • Implementation and usability testing • Auto-refactoring of the dynamic view • Refactoring in case of user-driven deviations from the process model Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
References Cited articles and resources, in order of appearance • [Aalst2011.book] van der Aalst, W.M.P.: Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer (2011). • [AalstEtAl2009] van der Aalst, W.M.P., van Dongen, B.F., Güther, C.W., Rozinat, A., Verbeek, E., Weijters, T.: Prom: The process mining toolkit. In de Medeiros, A.K.A., Weber, B., eds.: BPM (Demos). Volume 489 of CEUR Workshop Proceedings., CEUR-WS.org (2009) • [AalstEtAl2004] van der Aalst, W.M.P., Weijters, T., Maruster, L.: Workflow mining: Discovering process models from event logs. IEEE Trans. Knowl. Data Eng. 16(9) (2004) 1128–1142. • [WenEtAl2007] Wen, L., van der Aalst, W.M.P., Wang, J., Sun, J.: Mining process models with non-free-choice constructs. Data Min. Knowl. Discov. 15(2) (2007) 145–180. • [GüntherEtAl2007] Günther, C.W., van der Aalst, W.M.P.: Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics. BPM 2007: 328-343. • [WeijtersEtAl2001] Weijters, A., van der Aalst, W.: Rediscovering workflow models from event-based data using little thumb. Integrated Computer-Aided Engineering 10 (2001) 2003. • [MedeirosEtAl2007] Medeiros, A.K., Weijters, A.J., Aalst, W.M.: Genetic process mining: an experi- mental evaluation. Data Min. Knowl. Discov. 14(2) (2007) 245–304. • [AalstEtAl2010] van der Aalst, W., Rubin, V., Verbeek, H., van Dongen, B., Kindler, E., Gnther, C.: Process mining: a two-step approach to balance between underfitting and overfitting. Software and Systems Modeling 9 (2010) 87–111 10.1007/s10270-008- 0106-z. Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)
References Cited articles and resources, in order of appearance • [HillEtAl06] Hill, C., Yates, R., Jones, C., Kogan, S.L.: Beyond predictable workflows: Enhancing productivity in artful business processes. IBM Systems Journal 45(4), 663–682 (2006) • [ACTIVE09] Warren, P., Kings, N., et al.: Improving knowledge worker productivity - the active integrated approach. BT Technology Journal 26(2), 165–176 (2009) • [DiCiccioEtAl11] Di Ciccio, C., Mecella, M., Catarci, T.: Representing and Visualizing Mined Artful Processes in MailOfMine. USAB 2011:83-94 • [DiCiccioMecella12] Di Ciccio, C., Mecella,M.: Mining constraints for artful processes. In W. Abramowicz, D. Kriksciuniene, V.S., ed.: 15th International Conference on Business Information Systems. Volume 117 of Lecture Notes in Business Information Processing., Springer (2012) (to appear). • [DiCiccioMecella/TR12] Di Ciccio, C., Mecella, M.: MINERful, a mining algorithm for declarative process constraints in MailOfMine. Technical report, Dipartimento di Ingegneria Infor- matica, Automatica e Gestionale “Antonio Ruberti” – SAPIENZA, Universita` di Roma (2012). • [AalstEtAl06] van der Aalst, W.M.P., Pesic, M.: Decserflow: Towards a truly declarative service flow language. Proc. WS-FM 2006 • [MaggiEtAl11] Maggi, F.M., Mooij, A.J., van der Aalst, W.M.P.: User-guided discovery of declar- ative process models. In: CIDM, IEEE (2011) 192–199 • [Pnueli77] Pnueli, A.: The Temporal Logic of Programs. Proc. 18th Annual Symposium on Foundations of Software Technology and Theoretical Computer Science, 1977 Process Mining Claudio Di Ciccio (DIIAG, SAPIENZA – Università di Roma)