350 likes | 533 Views
Process-oriented System Analysis. Process Mining. BPM Lifecycle. Motivation. Up until now : Designed or pre-defined models Assumption that they are appropriate Process Mining Consideration of information from the execution of proceses This is covered in log data Logs
E N D
Process-orientedSystem Analysis Process Mining
Motivation • Upuntilnow: • Designedorpre-definedmodels • Assumptionthattheyareappropriate • Process Mining • Considerationofinformationfromtheexecutionofproceses • This iscovered in log data • Logs • Sequenceof log entries, whichcaptureevents in a companythatrelatetoprocesses
Log entries • Examples of log entries • Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57 • Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24 • Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18 • Function ContactCustomer(c1987, PromoMailing) completed on 12.11.2010 at 9:24:10 • Function StoreCustomerData(„Miller“, c1988, „Osnabrück“) completed on 12.11.2010 at 9:26:08 • Check Invoice for Invoice No. 4568 completed on 12.11.2010 at 9:26:38 • Function ContactCustomer(c1988, PromoMailing) completed on 12.11.2010 at Send 9:27:32
Logs bearvaluableinformation Logs bear valuable information to answer questions like • When and how many process instances have been executed? • Are there recurring patterns in the execution of activities? • Can process models be derived from the data? • Which paths of execution are used how often in the process models? • Are there paths which are never taken?
Process Discovery • Process Discovery is a technique for deriving a process model from log data • Input: execution logs as ordered lists of activities with time stamp and case id • Output: process model which could have generated the execution logs • The caseidisoften not directlycovered in thedata, andneedstobegenerated in pre-processing
Process Conformance • Process Conformance is a technique to analyze the relationship between log data and process models • Input: Logs and process model • Output: information on the relationship, e.g. fitness
Execution Logs • Assumption • Execution log definescompleteorderofevents, whichcan all berelatedtoprocessactivities • All events in theexecution log relatetoprocessinstancesoftheconsideredprocess • Hint • Often log entriesreferto different processmodels • This warrantsfilteringactivities • Abstraction • Techniquesoftenwork on abstractionoflogs • Focus on caseidandactivities
Execution Log Format • Log format • (caseID, activity) • Example • Check Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:19:57 • Function StoreCustomerData(„Müller“, c1987, „Bad Bentheim“) completed on 12.11.2010 at 9:22:24 • Send Invoice for Invoice No. 4567 completed on 12.11.2010 at 9:23:18 • Resulting Log • (4567, Check Invoice), (c1987, StoreCustomerData), (4567, Send Invoice), etc.
Execution Log • Further abstraction • A‘sandB‘s • (caseid, taskid) • Additional information • Event type, time, resource, data • Not consideredhere • Assumption • Activityexecutioncapturedbyoneevent • No intermediate activities case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
Process Discovery Algorithms • SimplestAlgorithm: The α – Algorithm • Relatively simple, somepropertiescanbeproofed • AffectedbyNoise, therefore not firstchoice in practice • Noise referstoincompleteorerroneouslogs • Furthermore, the α+(+) – Algorithms • α+ and α++ areextensionstothe α – Algorithmforrecognizingmorefine-granular structure in theprocess model • Also affectedbyNoise • Finally, techniquesfordealingwithNoise
Definitions • LetTbe a setofactivities (Tasks) andT *thesetof all sequencesofarbitrarylengthover T, thenwehave: • σ T * iscalledexecutionsequence, if all activitiesin σ belongtothe same processinstance • W T * iscalledexecution log (workflow log) • Assumptions • In eachprocessmodel, eachactivityappearsatmostonce • Eachdirectneighborrelationbetweenactivitiesisrepresentedat least once
Execution Logs case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
Execution Logs Execution sequences: Case 1: ABCD Case 2: ACBD Case 3: ABCD Case 4: ACBD Case 5: EF Resultingworkflow log: W = {ABCD, ACBD, EF} case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
Order relations Log based order relationsforpairs of activitiesa, b T in a workflow log W: • Directsuccessora>w bi.e. in an executionsequencebdirectlyfollows a • Causalityaw bi.e. a >w band not b >w a • Concurrencya║w bi.e.a >w bandb >w a • Exclusivenessaw bi.e. not a >w band notb >w a • Activitypairswhichneversucceedeachother
Execution log analysis case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D • W = {ABCD, ACBD, EF} • Directsuccessor • Causality • Concurrency
Execution log analysis • W = {ABCD, ACBD, EF} • Directsuccessor • Causality • Concurrency case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D 1) A>B A>C B>C B>D C>B C>D E>F 2) AB AC BD CD EF 3) B||C C||B
α-Algorithm • The ideaistoutilize order relationsforderiving a workflownetthatiscompliantwiththeserelations • Precisely, each order relationresults in a petrinetfragment, whichimposestherespectiverelationship
α-Algorithm • Idea (a) a b
α-Algorithm • Idea (b) a b, a c andb #c
α-Algorithm • Idea (c) b d, c d andb #c
α-Algorithm • Idea (d) a b, a c andb || c
α-Algorithm • Idea (e) b d, c d andb || c
The Alpha-Algorithm (simplified) • 1. Identify the set of all tasks in the log as TL. • 2. Identify the set of all tasks that have been observed as the first task in some case as TI. • 3. Identify the set of all tasks that have been observed as the last task in some case as TO. • 4. Identify the set of all connections to be potentially represented in the process model as a set XL. Add the following elements to XL: • a. Pattern (a): all pairs for which hold a→b. • b. Pattern (b): all triples for which hold a→(b#c). • c. Pattern (c): all triples for which hold (b#c)→d. • Note that triples for which Pattern (d) a→(b||c) or Pattern (e) (b||c)→d hold are not included in XL.
The Alpha-Algorithm (cont.) • 5. Construct the set YL as a subset of XL by: • a. Eliminating a→band a→cif there exists some a→(b#c). • b. Eliminating b→cand b→dif there exists some (b#c)→d. • 6. Connect start and end events in the following way: • a. If there are multiple tasks in the set TI of first tasks, then draw a start event leading to an XOR-split, which connects to every task in TI. Otherwise, directly connect the start event with the only first task. • b. For each task in the set TO of last tasks, add an end event and draw an arc from the task to the end event.
The Alpha-Algorithm (cont.) • 7. Construct the flow arcs in the following way: • a. Pattern (a): For each a→bin YL, draw an arc a to b. • b. Pattern (b): For each a→(b#c) in YL, draw an arc from a to an XOR-split, and from there to b and c. • c. Pattern (c): For each (b#c)→d in YL, draw an arc from b and c to an XOR-join, and from there to d. • d. Pattern (d) and (e): If a task in the so constructed process model has multiple incoming or multiple outgoing arcs, bundle these arcs with an AND-split or AND-join, respectively. • 8. Return the newly constructed process model.
α-AlgorithmExample case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D
α-AlgorithmExample α-Algorithm case 1 : task A case 2 : task A case 3 : task A case 3 : task B case 1 : task B case 1 : task C case 2 : task C case 4 : task A case 2 : task B case 2 : task D case 5 : task E case 4 : task C case 1 : task D case 3 : task C case 3 : task D case 4 : task B case 5 : task F case 4 : task D a(W):
Log Completeness • Level ofcompletenessrequiredfor a log • Assumefortheexecutionsequence EF, thereis a log missing • Then, thecorrectprocessmodelcannotbederived • Basic assumption: eachexecutionsequence must bepartofthe log • Consequence: thecompletebehaviourisvisible • Problem: amountofrequiredinstancesgrowsdramatically • Example: • 10 activitiesareexecuted in parallel • Amountof potential executionsequences:10! = 3.628.800
Log Completeness • Result • Fortheα-Algorithmitissufficienttohavecompleteness in termsofthesuccessorrelationship (>w) • Reason • All otherrelationsarederivedfromdirectsuccessorship • Interpretation • Each time twoactivitiesmaysucceedeachother, this must bevisible in at least oneexecutionsequence • Hint • In caseofhighlyconcurrentprocessmodels, thisreducestheamountofrequiredexecutionsequencesdramatically
Summary • Execution Logs • Process Mining usingthe Alpha-Algorithm