Pattern mining in system logs: opportunities for process improvement

Pattern mining in system logs: opportunities for process improvement Dolev Mezebovsky, Pnina Soffer, and Ilan Shimshoni BPMDS, Amsterdam, June 2009

Background • The implementation of enterprise systems is often a driver for business process change. • System implementation as an opportunity for redesigning business processes • Changes motivated by the need to adapt the enterprise to the system rather then the other way around • “Vanilla” implementations: • Implement basic functionality without modifications and make improvements afterwards • Cases of partial support to existing processes – people are forced to make workarounds and work inefficiently for the process to achieve its goal.

Example • Process: change a student’s study program

The problem addressed • Many such cases may exist in an organization • At first: all users complain • With time, some users may get used to the inefficient way of working • The question: How to identify the inefficient processes and prioritize their improvement?

Solution approach • The cases we are looking for include some repetition of a set of operations, as part of one “logical” task • These situations should be reflected in the event log of the system • Solution approach: mine for recurrent patterns of operations

Reflection in an event log

Graphical representation of log entries

Defining a pattern: basic concepts • Log entry=<User, Timestamp, Operation, ORSO> • ORSO: an ordered set of operands • Example: YPRESS, 13:50, Detach course, Fredrick, Linear Algebra, CS Minor. • For two entries in a log: • Invariant set: set of entry elements whose values are equal for the two entries • Variant set: set of entry elements whose values are different for the two entries

Pattern identification • Two entries are potentially in the same pattern if: • User  {Invariant} • Timestamp  {Variant}; |TS(1)-TS(2)| < Timeframe • {Operation, ORSO}  {Invariant}  • {Operation, ORSO}  {Variant}  • Potential pattern entry: <User, TimeRange, Operations, ORSOs> • The algorithm dynamically aggregates entries into potential pattern entries, seeking for largest possible patterns.

Example • [(1),(2)] = [(1): < YPRESS, 13.45.52, Attach course, Fredrick, Linear Algebra, MIS Major>, (2): < YPRESS, 13.46.26, Attach course, Fredrick, Algorithms, MIS Major>] • (1, 2) : < YPRESS, (13.45.52, 13.46.26), Attach course, Fredrick, (Linear Algebra, Algorithms), MIS Major> • Second iteration: • [(1, 2), (3)] = [(1, 2) : < YPRESS, (13.45.52, 13.46.26), Attach course, Fredrick, (Linear Algebra, Algorithms), MIS Major>, (3): < YPRESS, 13.47.44, Attach course, Fredrick, Data Structures, MIS Major>] • (1, 2, 3): < YPRESS, (13.45.52, 13.47.44), Attach course, Fredrick, (Linear Algebra, Algorithms, Data Structures), MIS Major>

From potential pattern to pattern type • Pattern type definition: <I, V>. • I: a set of invariant element types (Operation, operand type) • V: a set of variant element types (Operation, operand type) • Example: • I = {Operation, Student, Program} • V = {Course}

. Pattern metrics • The count CP of a pattern type P: the number of patterns of this type in the log file. • The average sizeASP of a pattern type P: the average number of entries in patterns of type P. Let P occur CP times in a log file, so occurrence i includes ni entries. Then: • The average timeATP of a pattern type p: the average time range (difference between the maximal and minimal timestamps) in patterns of type p.

Identifying and prioritizing process improvement requirements • Find out which of the identified patterns reflects inefficient processes • By interviewing users • Prioritize patterns to be automated • By size-weighted count: SCP = ASP*CP • By time-weighted count: TCP = ATP*CP

Conclusions • We address a situation where technology drives processes in an undesirable way • We utilize mining technology to identify and prioritize requirements for automating inefficient processes. • Our solution identifies recurrent patterns in the system log and provides metrics for prioritization.

Future research • Finalize the overall algorithm • Experiment with the university log to evaluate the proposed method • Is it capable of identifying patterns that are a-priori known? • Ratio of real problems identified vs. patterns that reflect “normal” processes • Sensitivity to the timeframe parameter • Experiment with logs from other domains

Pattern mining in system logs: opportunities for process improvement