1 / 23

Email Analysis for Business Process Discovery 

Email Analysis for Business Process Discovery . Nassim LAGA 1 Marwa ELLEUCH 1,2 Walid GAALOUL 2 Oumaima ALAOUI ISMAILI 1 1 Orange Labs , France  2 Télécom SudParis , Paris Saclay university , France. Introduction. Informal methods. ?. Process, Activity and Instance Recognition.

bijan
Download Presentation

Email Analysis for Business Process Discovery 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Email Analysis for Business Process Discovery  Nassim LAGA1 Marwa ELLEUCH1,2 Walid GAALOUL2 Oumaima ALAOUI ISMAILI1 1Orange Labs, France  2Télécom SudParis, Paris Saclay university, France

  2. Introduction Informal methods ? Process, Activity and Instance Recognition Structured Event logs Process Discovery = Process Model Generation Conversion intostructured format Fuzzy miner, Heuristic Miner, Alpha Algorithm X Hyp1: Have a structured format Hyp2: Contain the trace of all BP tasks X

  3. Introduction Structured Event logs Email log ? Process, Activity and Instance Recognition Conversion intostructured format

  4. Propositions Structured Event logs Email log ? Process, Activity and Instance Recognition • Automaticallyidentify one activity, one process and one instance related to each email usingsupervisedlearning and clustering techniques • Minimizehuman intervention by using: • Progressivelylearningapproachthatgeneratestwotypes of predictivemodels for predictingprocess and activitynames • Collaborative approach to build the learningdatasetfor training thesepredictivemodels => minimize the individualhuman effort • Non parametricclusteringalgorithm(Hdbscan) for identifyingprocess instances Conversion intostructured format

  5. Propositions Unstructured email logs Step1: Activity and ProcessLabels generation Step2: Process Instance Detection Progressivelylearningapproach to train process and activitypredictivemodels Clustering Per process and per activity email lists Per process instance lists Conversion bloc intostructured format Structuredevent logs Step3: Event Logs generation

  6. Propositions Unstructured email logs Step1: Activity and ProcessLabels generation Step2: Process Instance Detection Progressivelylearningapproach to train process and activitypredictivemodels Clustering Per process and per activity email lists Per process instance lists Conversion bloc intostructured format Structuredevent logs Step3: Event Logs generation

  7. Propositions Process Predictive model Mini-batchinglearning Collaborative annotation Features - For predictingprocesses: Subject & Entities of email interlocutors - For predictingactivities: Subject & content & Entities of email interlocutors & Exchange History (short emails) Unlabeleddata Labeled data Activity Predictive models Labeled data Prediction Unlabeleddata Step1: Activity and Process Labels Generation

  8. Propositions Yes No End If error_rate > thresh Activity Predictive model Error_ratecalculation Manual correction Number of manual correction Process Predictive models New batch of emails • Detectand Replace Particular Expressions by a tag • Remove stop words and personNames • Lemmatizeterms • Generate 1gram, 2gram vocabulary • Update wordcountersacross the whole email dataset => Generate TFIDF Values • GenerateEntities Interaction values ……….Email3, Email2, Email1 Preprocessing and selectingfeatures Predicted Labels Adding the new batch to the wholedataset Re-training Pre- processed data Prediction Mini-Batchinglearningapproach All existing emails (annotated and preprocessed)

  9. Propositions Unstructured email logs Step1: Activity and ProcessLabels generation Step2: Process Instance Detection Progressivelylearningapproach to train process and activitypredictivemodels Clustering Per process and per activity email lists Per process instance lists Conversion bloc intostructured format Structuredevent logs Step3: Event Logs generation

  10. Propositions Similarity function S betweentwo emails E1 and E2 : S(E1,E2) = W0 + W1 (1-) + W2 Jaccard Distance betweenentity sets of emails’ interlocutors (C(E1) and C(E2) Time Distance between E1 and E2 (ts = timestamp, ) Jaccard Distance related to the namedentities and referencespresent in textual data of E1 and E2 W0, W1 and W2 are tuned by usersaccording to the type of the process (e.g ; if it has time constraintssuch as the accountingclosingprocess or not)

  11. Propositions Unstructured email logs Step1: Activity and ProcessLabels generation Step2: Process Instance Detection Progressivelylearningapproach to train process and activitypredictivemodels Clustering Per process and per activity email lists Per process instance lists Conversion bloc intostructured format Structuredevent logs Step3: Event Logs generation

  12. Evaluation Per process and per activity email lists Per process instance lists Step1: Activity and Process Labels generation Step3: Event Logs generation Step2: Process Instance Detection

  13. Evaluation • Number of emails : 1024 • Number of activities: 116 • Number of processes : 13 : Hiring, patent Application, Command, Conference Participation, travelexpenserefund, etc... • Testing 3 predictivealgorithms: • Randomforest (RF) • LogisticRegression (LR) withStochastic Gradient Descent (SGD) optimiser • Support Vector Machine (SVM) Evaluation Dataset Evaluation usingF1-Score Per process and per activity email lists Per process instance lists Step1: Activity and Process Labels generation Step3: Event Logs generation Step2: Process Instance Detection

  14. Evaluation Per process and per activity email lists Per process instance lists Step1: Activity and Process Labels generation Step3: Event Logs generation Step2: Process Instance Detection • Evaluation • ClusteringAlgorithm: HDBSCAN • Evaluation Metric: Adjustedmutual Information (AMI) • Returned Value • → 1 if real and Hdbscan partitions are stronglymatched • → 0 if real and Hdbscan partitions are weaklymatched • - Number of emails : 180 • Processname : HiringProcess • Number of instance clusters: 11 Evaluation Dataset Evaluation Result: = 0,86 → 1 => The real and Hdbscan partitions are stronglymatched Real partition (Instance clusters Manuallydefined)

  15. Evaluation Per process and per activity email lists Per process instance lists Step1: Activity and Process Labels generation Step3: Event Logs generation Step2: Process Instance Detection Real Model ? Comparision • - Number of emails : 180 • Processname : HiringProcess • Number of instance clusters: 11 Evaluation Dataset Conversion intostructured format Applying the Heuristic Miner Algorithm Discovered Hiring Model

  16. Evaluation Real Hiringprocess Model DiscoveredHiring Model Looping Occurrence number over all process instances of eachbeahvior Looping Additionalbehavior X Not allowed behavior - Almost in conformity - Twodiscrepancy types detectedatlowfrequency : (1) Unfitting model behavior (2) Additional model behavior - Discrepancy Causes : Errorsaccumulatedthrough the log building systems / Log miner technique / Real differencebetween the process as observed in the emails and the relatedtheoretical BP model.

  17. Conclusion Propositions & Advantages • A solution for mining business processesfrom emails • (+) The need of lesshuman intervention atindividuallevelcompared to relatedworksthrough the use of a collaborative learningapproach and a non parametricclusteringalgorithm • (+) Good performances obtainedaftertesting the overallapproach on real dataset Limitations & Perspectives • One Email can talk about more than one activity and more than one instance • The current approach still requires human involvement => More automate the BP discovery pipeline • Similarity meanings are not used => Employ similarity meaning measures for constructing learning features based on email contents

  18. Thanks for your attention

More Related