1 / 17

Distributed Genetic Process Mining using Sampling

Distributed Genetic Process Mining using Sampling. Carmen Bratosin , Natalia Sidorova, Wil van der Aalst. Process Mining. Process Mining: Process Models Discovery from Event Logs. A small example. A. V. D. V. V. G. F. E. B. V. V. V. V. V. C. V.

ave
Download Presentation

Distributed Genetic Process Mining using Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Genetic Process Mining using Sampling Carmen Bratosin, Natalia Sidorova, Wil van der Aalst

  2. Process Mining

  3. Process Mining:Process Models Discovery from Event Logs

  4. A small example A V D V V G F E B V V V V V C V Input condition: C can be exe-cuted if B OR G OR F has already been executed Output condition: after D will be executed B AND G

  5. Context Genetic based Process Mining Algorithm • Heuristics based process mining algorithms drawbacks: • Fail to discover complex process structures • Not robust to noise or infrequent behavior

  6. Genetic Miner Find a Model such that maxSpaceOfAllModelsfitness(Log, Model) Build Initial Population Compute Fitness Create New Population (Elitism, Mutation, Crossover) Evaluate Stop Condition NO YES Stop

  7. Fitness Computation – Main Ideas • Execution time linearly dependent on the number of traces • Execution time is dependent on the quality of the solution • More complex the process model to be discovered => more time needed Each individual is assessed against each trace For each trace rewards and penaltiesare given when activities may/ may not be replayed

  8. Genetic Miner Disadvantages • time consumption • the time needed to compute the fitness • the large number of fitness evaluations needed The goal To use distribution techniques in order to improve the time consumption. Advantages discover non-trivial process structures (e.g. non free-choice routings) robustness to noise

  9. Distributed Genetic Process MiningEvent Log Distribution Coordinator

  10. Event logs redundancy Process structure = composition of multiple control-flow patterns (choice, parallel, iteration) Different instances formed of e.g. different combinations of choices made, or different interleaving of events Different execution traces may represent the “same” behavior => event log redundancy

  11. Basic idea behind the algorithm V C A B F E V V V V V D G V V V

  12. Island Algorithm

  13. Evaluation Three different logs:

  14. Experiment design Vary the sample size from 10 traces to the full log Vary the stop condition Vary the population size Use islands with same set-up (processor, memory, OS etc.)

  15. Experimental Results Same quality achieved

  16. Experimental Results PS – population size ISS – sample size MUNT – mean used number of traces MFC – mean number of fitness computation MET – mean execution time

  17. Conclusions • A new distributed genetic algorithm for process mining using sampling • Evaluation confirmed that our approach reduces the overall computation time • The sample size is strongly correlated with the logs characteristics and their level of difficulty from the mining point of view • Future work: • Use smart sampling techniques to reduce the execution time

More Related