-
Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge
Authors:
Alessandro Berti,
Urszula Jessen,
Wil M. P. van der Aalst,
Dirk Fahland
Abstract:
Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interac…
▽ More
Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interactions between the different objects. Therefore, anomalous patterns are proposed to the user without requiring the definition of object-centric process models. This paper proposes different methodologies for object-centric anomaly detection and discusses the role of domain knowledge for these methodologies. We discuss the advantages and limitations of Large Language Models (LLMs) in the provision of such domain knowledge. Following our experience in a real-life P2P process, we also discuss the role of algorithms (dimensionality reduction+anomaly detection), suggest some pre-processing steps, and discuss the role of feature propagation.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
High-Level Event Mining: Overview and Future Work
Authors:
Bianka Bakullari,
Wil M. P. van der Aalst
Abstract:
Process mining traditionally relies on input consisting of low-level events that capture individual activities, such as filling out a form or processing a product. However, many of the complex problems inherent in processes, such as bottlenecks and compliance issues, extend beyond the scope of individual events and process instances. Consider congestion, for instance, it can involve and impact num…
▽ More
Process mining traditionally relies on input consisting of low-level events that capture individual activities, such as filling out a form or processing a product. However, many of the complex problems inherent in processes, such as bottlenecks and compliance issues, extend beyond the scope of individual events and process instances. Consider congestion, for instance, it can involve and impact numerous cases, much like how a traffic jam affects many cars simultaneously. High-level event mining seeks to address such phenomena using the regular event data available. This report offers an extensive and comprehensive overview at existing work and challenges encountered when lifting the perspective from individual events and cases to system-level events.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Process-Aware Analysis of Treatment Paths in Heart Failure Patients: A Case Study
Authors:
Harry H. Beyel,
Marlo Verket,
Viki Peeva,
Christian Rennert,
Marco Pegoraro,
Katharina Schütt,
Wil M. P. van der Aalst,
Nikolaus Marx
Abstract:
Process mining in healthcare presents a range of challenges when working with different types of data within the healthcare domain. There is high diversity considering the variety of data collected from healthcare processes: operational processes given by claims data, a collection of events during surgery, data related to pre-operative and post-operative care, and high-level data collections based…
▽ More
Process mining in healthcare presents a range of challenges when working with different types of data within the healthcare domain. There is high diversity considering the variety of data collected from healthcare processes: operational processes given by claims data, a collection of events during surgery, data related to pre-operative and post-operative care, and high-level data collections based on regular ambulant visits with no apparent events. In this case study, a data set from the last category is analyzed. We apply process-mining techniques on sparse patient heart failure data and investigate whether an information gain towards several research questions is achievable. Here, available data are transformed into an event log format, and process discovery and conformance checking are applied. Additionally, patients are split into different cohorts based on comorbidities, such as diabetes and chronic kidney disease, and multiple statistics are compared between the cohorts. Conclusively, we apply decision mining to determine whether a patient will have a cardiovascular outcome and whether a patient will die.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Process Modeling With Large Language Models
Authors:
Humam Kourani,
Alessandro Berti,
Daniel Schuster,
Wil M. P. van der Aalst
Abstract:
In the realm of Business Process Management (BPM), process modeling plays a crucial role in translating complex process dynamics into comprehensible visual representations, facilitating the understanding, analysis, improvement, and automation of organizational processes. Traditional process modeling methods often require extensive expertise and can be time-consuming. This paper explores the integr…
▽ More
In the realm of Business Process Management (BPM), process modeling plays a crucial role in translating complex process dynamics into comprehensible visual representations, facilitating the understanding, analysis, improvement, and automation of organizational processes. Traditional process modeling methods often require extensive expertise and can be time-consuming. This paper explores the integration of Large Language Models (LLMs) into process modeling to enhance the accessibility of process modeling, offering a more intuitive entry point for non-experts while augmenting the efficiency of experts. We propose a framework that leverages LLMs for the automated generation and iterative refinement of process models starting from textual descriptions. Our framework involves innovative prompting strategies for effective LLM utilization, along with a secure model generation protocol and an error-handling mechanism. Moreover, we instantiate a concrete system extending our framework. This system provides robust quality guarantees on the models generated and supports exporting them in standard modeling notations, such as the Business Process Modeling Notation (BPMN) and Petri nets. Preliminary results demonstrate the framework's ability to streamline process modeling tasks, underscoring the transformative potential of generative AI in the BPM field.
△ Less
Submitted 8 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
ProMoAI: Process Modeling with Generative AI
Authors:
Humam Kourani,
Alessandro Berti,
Daniel Schuster,
Wil M. P. van der Aalst
Abstract:
ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the g…
▽ More
ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the generated model, which is then used for refining the process model. ProMoAI utilizes the capabilities LLMs to offer a novel, AI-driven approach to process modeling, significantly reducing the barrier to entry for users without deep technical knowledge in process modeling.
△ Less
Submitted 29 April, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
OCEL (Object-Centric Event Log) 2.0 Specification
Authors:
Alessandro Berti,
Istvan Koren,
Jan Niklas Adams,
Gyunam Park,
Benedikt Knopp,
Nina Graves,
Majid Rafiei,
Lukas Liß,
Leah Tacke Genannt Unterberg,
Yisong Zhang,
Christopher Schwanen,
Marco Pegoraro,
Wil M. P. van der Aalst
Abstract:
Object-Centric Event Logs (OCELs) form the basis for Object-Centric Process Mining (OCPM). OCEL 1.0 was first released in 2020 and triggered the development of a range of OCPM techniques. OCEL 2.0 forms the new, more expressive standard, allowing for more extensive process analyses while remaining in an easily exchangeable format. In contrast to the first OCEL standard, it can depict changes in ob…
▽ More
Object-Centric Event Logs (OCELs) form the basis for Object-Centric Process Mining (OCPM). OCEL 1.0 was first released in 2020 and triggered the development of a range of OCPM techniques. OCEL 2.0 forms the new, more expressive standard, allowing for more extensive process analyses while remaining in an easily exchangeable format. In contrast to the first OCEL standard, it can depict changes in objects, provide information on object relationships, and qualify these relationships to other objects or specific events. Compared to XES, it is more expressive, less complicated, and better readable. OCEL 2.0 offers three exchange formats: a relational database (SQLite), XML, and JSON format. This OCEL 2.0 specification document provides an introduction to the standard, its metamodel, and its exchange formats, aimed at practitioners and researchers alike.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Developing a High-Performance Process Mining Library with Java and Python Bindings in Rust
Authors:
Aaron Küsters,
Wil M. P. van der Aalst
Abstract:
The most commonly used open-source process mining software tools today are ProM and PM4Py, written in Java and Python, respectively. Such high-level, often interpreted, programming languages trade off performance with memory safety and ease-of-use. In contrast, traditional compiled languages, like C or C++, can achieve top performance but often suffer from instability related to unsafe memory mana…
▽ More
The most commonly used open-source process mining software tools today are ProM and PM4Py, written in Java and Python, respectively. Such high-level, often interpreted, programming languages trade off performance with memory safety and ease-of-use. In contrast, traditional compiled languages, like C or C++, can achieve top performance but often suffer from instability related to unsafe memory management. Lately, Rust emerged as a highly performant, compiled programming language with inherent memory safety. In this paper, we describe our approach to developing a shared process mining library in Rust with bindings to both Java and Python, allowing full integration into the existing ecosystems, like ProM and PM4Py. By facilitating interoperability, our methodology enables researchers or industry to develop novel algorithms in Rust once and make them accessible to the entire community while also achieving superior performance.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Advancements and Challenges in Object-Centric Process Mining: A Systematic Literature Review
Authors:
Alessandro Berti,
Marco Montali,
Wil M. P. van der Aalst
Abstract:
Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process min…
▽ More
Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process mining analyses remains limited. This paper embarks on a comprehensive literature review of object-centric process mining, providing insights into the current status of the discipline and its historical trajectory.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Grouping Local Process Models
Authors:
Viki Peeva,
Wil M. P. van der Aalst
Abstract:
In recent years, process mining emerged as a proven technology to analyze and improve operational processes. An expanding range of organizations using process mining in their daily operation brings a broader spectrum of processes to be analyzed. Some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describ…
▽ More
In recent years, process mining emerged as a proven technology to analyze and improve operational processes. An expanding range of organizations using process mining in their daily operation brings a broader spectrum of processes to be analyzed. Some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describing the entire process. Therefore, the subdiscipline of Local Process Model (LPM) discovery tries to build a set of LPMs, i.e., smaller models that explain sub-behaviors of the process. However, like other pattern mining approaches, LPM discovery algorithms also face the problems of model explosion and model repetition, i.e., the algorithms may create hundreds if not thousands of models, and subsets of them are close in structure or behavior. This work proposes a three-step pipeline for grouping similar LPMs using various process model similarity measures. We demonstrate the usefulness of grouping through a real-life case study, and analyze the impact of different measures, the gravity of repetition in the discovered LPMs, and how it improves after grouping on multiple real event logs.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Discovering High-Quality Process Models Despite Data Scarcity
Authors:
Jan Niklas Adams,
Jari Peeperkorn,
Tobias Brockhoff,
Isabelle Terrier,
Heiko Göhner,
Merih Seran Uysal,
Seppe vanden Broucke,
Jochen De Weerdt,
Wil M. P. van der Aalst
Abstract:
Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for processes with concurrent subprocesses such as hierarchical, composite, or distributed processes. While process discovery algorithms traditionally use sequence…
▽ More
Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for processes with concurrent subprocesses such as hierarchical, composite, or distributed processes. While process discovery algorithms traditionally use sequences of activities as input, recently introduced object-centric process discovery algorithms can use graphs of activities as input, encoding partial orders between activities. As such, they contain the concurrency information of many sequences in a single graph. In this paper, we address the research question of reducing process discovery data requirements when using object-centric event logs for process discovery. We classify different real-life processes according to the control-flow complexity within and between subprocesses and introduce an evaluation framework to assess process discovery algorithm quality of traditional and object-centric process discovery based on the sample size. We complement this with a large-scale production process case study. Our results show reduced data requirements, enabling the discovery of large, concurrent processes such as manufacturing with little data, previously infeasible with traditional process discovery. Our findings suggest that object-centric process mining could revolutionize process discovery in various sectors, including manufacturing and supply chains.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Analyzing An After-Sales Service Process Using Object-Centric Process Mining: A Case Study
Authors:
Gyunam Park,
Sevde Aydin,
Cuneyt Ugur,
Wil M. P. van der Aalst
Abstract:
Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexpl…
▽ More
Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexplored benefits in actual operational scenarios. Through an in-depth case study of Borusan Cat's after-sales service process, this study emphasizes the capability of object-centric process mining to capture entangled business process details. Utilizing an event log of approximately 65,000 events, our analysis underscores the importance of embracing this paradigm for richer business insights and enhanced operational improvements.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Extracting Rules from Event Data for Study Planning
Authors:
Majid Rafiei,
Duygu Bayrak,
Mahsa Pourbafrani,
Gyunam Park,
Hayyan Helal,
Gerhard Lakemeyer,
Wil M. P. van der Aalst
Abstract:
In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recomm…
▽ More
In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recommendations in the form of rules for study planning and compare them to the recommended study plan. The evaluation focuses on RWTH Aachen University computer science bachelor program students and demonstrates that the proposed course sequence features effectively explain academic performance measures. Furthermore, the findings suggest avenues for developing more adaptable study plans.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
The Interplay Between High-Level Problems and The Process Instances That Give Rise To Them
Authors:
Bianka Bakullari,
Jules van Thoor,
Dirk Fahland,
Wil M. P. van der Aalst
Abstract:
Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high…
▽ More
Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high-level behavior to cover all process behavior which can not be captured in terms of the individual process instances. %Whenever such behavior emerges, we call the cases which are involved in it participating cases. The natural question arises as to how the characteristics of cases relate to the high-level behavior they give rise to. In this work, we first show how to detect and correlate observations of high-level problems, as well as determine the corresponding (non-)participating cases. Then we show how to assess the connection between any case-level characteristic and any given detected sequence of high-level problems. Applying our method on the event data of a real loan application process revealed which specific combinations of delays, batching and busy resources at which particular parts of the process correlate with an application's duration and chance of a positive outcome.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Applying Process Mining on Scientific Workflows: a Case Study
Authors:
Zahra Sadeghibogar,
Alessandro Berti,
Marco Pegoraro,
Wil M. P. van der Aalst
Abstract:
Computer-based scientific experiments are becoming increasingly data-intensive. High-Performance Computing (HPC) clusters are ideal for executing large scientific experiment workflows. Executing large scientific workflows in an HPC cluster leads to complex flows of data and control within the system, which are difficult to analyze. This paper presents a case study where process mining is applied t…
▽ More
Computer-based scientific experiments are becoming increasingly data-intensive. High-Performance Computing (HPC) clusters are ideal for executing large scientific experiment workflows. Executing large scientific workflows in an HPC cluster leads to complex flows of data and control within the system, which are difficult to analyze. This paper presents a case study where process mining is applied to logs extracted from SLURM-based HPC clusters, in order to document the running workflows and find the performance bottlenecks. The challenge lies in correlating the jobs recorded in the system to enable the application of mainstream process mining techniques. Users may submit jobs with explicit or implicit interdependencies, leading to the consideration of different event correlation techniques. We present a log extraction technique from SLURM clusters, completed with an experimental.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study
Authors:
Alessandro Berti,
Daniel Schuster,
Wil M. P. van der Aalst
Abstract:
Large Language Models (LLMs) are capable of answering questions in natural language for various purposes. With recent advancements (such as GPT-4), LLMs perform at a level comparable to humans for many proficient tasks. The analysis of business processes could benefit from a natural process querying language and using the domain knowledge on which LLMs have been trained. However, it is impossible…
▽ More
Large Language Models (LLMs) are capable of answering questions in natural language for various purposes. With recent advancements (such as GPT-4), LLMs perform at a level comparable to humans for many proficient tasks. The analysis of business processes could benefit from a natural process querying language and using the domain knowledge on which LLMs have been trained. However, it is impossible to provide a complete database or event log as an input prompt due to size constraints. In this paper, we apply LLMs in the context of process mining by i) abstracting the information of standard process mining artifacts and ii) describing the prompting strategies. We implement the proposed abstraction techniques into pm4py, an open-source process mining library. We present a case study using available event logs. Starting from different abstractions and analysis questions, we formulate prompts and evaluate the quality of the answers.
△ Less
Submitted 14 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
A Collection of Simulated Event Logs for Fairness Assessment in Process Mining
Authors:
Timo Pohl,
Alessandro Berti,
Mahnaz Sadat Qafari,
Wil M. P. van der Aalst
Abstract:
The analysis of fairness in process mining is a significant aspect of data-driven decision-making, yet the advancement in this field is constrained due to the scarcity of event data that incorporates fairness considerations. To bridge this gap, we present a collection of simulated event logs, spanning four critical domains, which encapsulate a variety of discrimination scenarios. By simulating the…
▽ More
The analysis of fairness in process mining is a significant aspect of data-driven decision-making, yet the advancement in this field is constrained due to the scarcity of event data that incorporates fairness considerations. To bridge this gap, we present a collection of simulated event logs, spanning four critical domains, which encapsulate a variety of discrimination scenarios. By simulating these event logs with CPN Tools, we ensure data with known ground truth, thereby offering a robust foundation for fairness analysis. These logs are made freely available under the CC-BY-4.0 license and adhere to the XES standard, thereby assuring broad compatibility with various process mining tools. This initiative aims to empower researchers with the requisite resources to test and develop fairness techniques within process mining, ultimately contributing to the pursuit of equitable, data-driven decision-making processes.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Revisiting the Alpha Algorithm To Enable Real-Life Process Discovery Applications -- Extended Report
Authors:
Aaron Küsters,
Wil M. P. van der Aalst
Abstract:
The Alpha algorithm was the first process discovery algorithm that was able to discover process models with concurrency based on incomplete event data while still providing formal guarantees. However, as was stated in the original paper, practical applicability is limited when dealing with exceptional behavior and processes that cannot be described as a structured workflow net without short loops.…
▽ More
The Alpha algorithm was the first process discovery algorithm that was able to discover process models with concurrency based on incomplete event data while still providing formal guarantees. However, as was stated in the original paper, practical applicability is limited when dealing with exceptional behavior and processes that cannot be described as a structured workflow net without short loops. This paper presents the Alpha+++ algorithm that overcomes many of these limitations, making the algorithm competitive with more recent process mining approaches. The different steps provide insights into the practical challenges of learning process models with concurrency, choices, sequences, loops, and skipping from event data. The approach was implemented in ProM and tested on various publicly available, real-life event logs.
△ Less
Submitted 3 October, 2023; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Object-Centric Alignments
Authors:
Lukas Liss,
Jan Niklas Adams,
Wil M. P. van der Aalst
Abstract:
Processes tend to interact with other processes and operate on various objects of different types. These objects can influence each other creating dependencies between sub-processes. Analyzing the conformance of such complex processes challenges traditional conformance-checking approaches because they assume a single-case identifier for a process. To create a single-case identifier one has to flat…
▽ More
Processes tend to interact with other processes and operate on various objects of different types. These objects can influence each other creating dependencies between sub-processes. Analyzing the conformance of such complex processes challenges traditional conformance-checking approaches because they assume a single-case identifier for a process. To create a single-case identifier one has to flatten complex processes. This leads to information loss when separating the processes that interact on some objects. This paper introduces an alignment approach that operates directly on these object-centric processes. We introduce alignments that can give behavior-based insights into how closely related the event data generated by a process and the behavior specified by an object-centric Petri net are. The contributions of this paper include a definition for object-centric alignments, an algorithm to compute them, a publicly available implementation, and a qualitative and quantitative evaluation. The qualitative evaluation shows that object-centric alignments can give better insights into object-centric processes because they correctly consider inter-object dependencies. Findings from the quantitative evaluation show that the run-time grows exponentially with the number of objects, the length of the process execution, and the cost of the alignment. The evaluation results motivate future research to improve the run-time and make object-centric alignments more applicable for larger processes.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
TraVaG: Differentially Private Trace Variant Generation Using GANs
Authors:
Majid Rafiei,
Frederik Wangelik,
Mahsa Pourbafrani,
Wil M. P. van der Aalst
Abstract:
Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g.,…
▽ More
Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Performance-Preserving Event Log Sampling for Predictive Monitoring
Authors:
Mohammadreza Fani Sani,
Mozhgan Vazifehdoostirani,
Gyunam Park,
Marco Pegoraro,
Sebastiaan J. van Zelst,
Wil M. P. van der Aalst
Abstract:
Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods…
▽ More
Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods require a hyper-parameter optimization that requires several repetitions of the training process which is not feasible in many real-life applications. In this paper, we propose an instance selection procedure that allows sampling training process instances for prediction models. We show that our instance selection procedure allows for a significant increase of training speed for next activity and remaining time prediction methods while maintaining reliable levels of prediction accuracy.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Discovering Sound Free-choice Workflow Nets With Non-block Structures
Authors:
Tsung-Hao Huang,
Wil M. P. van der Aalst
Abstract:
Process discovery aims to discover models that can explain the behaviors of event logs extracted from information systems. While various approaches have been proposed, only a few guarantee desirable properties such as soundness and free-choice. State-of-the-art approaches that exploit the representational bias of process trees to provide the guarantees are constrained to be block-structured.Such c…
▽ More
Process discovery aims to discover models that can explain the behaviors of event logs extracted from information systems. While various approaches have been proposed, only a few guarantee desirable properties such as soundness and free-choice. State-of-the-art approaches that exploit the representational bias of process trees to provide the guarantees are constrained to be block-structured.Such constructs limit the expressive power of the discovered models, i.e., only a subset of sound free-choice workflow nets can be discovered. To support a more flexible structural representation, we aim to discover process models that provide the same guarantees but also allow for non-block structures. Inspired by existing works that utilize synthesis rules from the free-choice nets theory, we propose an automatic approach that incrementally adds activities to an existing process model with predefined patterns. Playing by the rules ensures that the resulting models are always sound and free-choice. Furthermore, the discovered models are not restricted to block structures and are thus more flexible. The approach has been implemented in Python and tested using various real-life event logs. The experiments show that our approach can indeed discover models with competitive quality and more flexible structures compared to the existing approach.
△ Less
Submitted 3 January, 2023;
originally announced January 2023.
-
Comparing Ordering Strategies For Process Discovery Using Synthesis Rules
Authors:
Tsung-Hao Huang,
Wil M. P. van der Aalst
Abstract:
Process discovery aims to learn process models from observed behaviors, i.e., event logs, in the information systems.The discovered models serve as the starting point for process mining techniques that are used to address performance and compliance problems. Compared to the state-of-the-art Inductive Miner, the algorithm applying synthesis rules from the free-choice net theory discovers process mo…
▽ More
Process discovery aims to learn process models from observed behaviors, i.e., event logs, in the information systems.The discovered models serve as the starting point for process mining techniques that are used to address performance and compliance problems. Compared to the state-of-the-art Inductive Miner, the algorithm applying synthesis rules from the free-choice net theory discovers process models with more flexible (non-block) structures while ensuring the same desirable soundness and free-choiceness properties. Moreover, recent development in this line of work shows that the discovered models have compatible quality. Following the synthesis rules, the algorithm incrementally modifies an existing process model by adding the activities in the event log one at a time. As the applications of rules are highly dependent on the existing model structure, the model quality and computation time are significantly influenced by the order of adding activities. In this paper, we investigate the effect of different ordering strategies on the discovered models (w.r.t. fitness and precision) and the computation time using real-life event data. The results show that the proposed ordering strategy can improve the quality of the resulting process models while requiring less time compared to the ordering strategy solely based on the frequency of activities.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Discovering Process Models With Long-Term Dependencies While Providing Guarantees and Filtering Infrequent Behavior Patterns
Authors:
Lisa Luise Mannel,
Wil M. P. van der Aalst
Abstract:
In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are con…
▽ More
In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are considered to be fitting with respect to a certain fraction of the behavior described by the given event log as indicated by a given noise threshold. It evaluates all possible candidate places using token-based replay. The set of replayable traces is determined for each place in isolation, i.e., these sets do not need to be consistent. This allows the algorithm to abstract from infrequent behavioral patterns occurring only in some traces. However, when combining places into a Petri net by connecting them to the corresponding uniquely labeled transitions, the resulting net can replay exactly those traces from the event log that are allowed by the combination of all inserted places. Thus, inserting places one-by-one without considering their combined effect may result in deadlocks and low fitness of the Petri net. In this paper, we explore adaptions of the eST-Miner, that aim to select a subset of places such that the resulting Petri net guarantees a definable minimal fitness while maintaining high precision with respect to the input event log. Furthermore, current place evaluation techniques tend to block the execution of infrequent activity labels. Thus, a refined place fitness metric is introduced and thoroughly investigated. In our experiments we use real and artificial event logs to evaluate and compare the impact of the various place selection strategies and place fitness evaluation metrics on the returned Petri net.
△ Less
Submitted 22 January, 2024; v1 submitted 21 December, 2022;
originally announced December 2022.
-
Resolving Uncertain Case Identifiers in Interaction Logs: A User Study
Authors:
Marco Pegoraro,
Merih Seran Uysal,
Tom-Hendrik Hülsmann,
Wil M. P. van der Aalst
Abstract:
Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions…
▽ More
Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions to a specific process instance in the software. In this paper, we propose a neural network-based technique to determine a case notion for click data, thus enabling process mining and other process analysis techniques on user interaction data. We describe our method, show its scalability to datasets of large dimensions, and we validate its efficacy through a user study based on the segmented event log resulting from interaction data of a mobility sharing company. Interviews with domain experts in the company demonstrate that the case notion obtained by our method can lead to actionable process insights.
△ Less
Submitted 21 November, 2022;
originally announced December 2022.
-
Control-Flow-Based Querying of Process Executions from Partially Ordered Event Data
Authors:
Daniel Schuster,
Michael Martini,
Sebastiaan J. van Zelst,
Wil M. P. van der Aalst
Abstract:
Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions a…
▽ More
Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions are considered sequentially ordered process activities. However, process executions are increasingly viewed as partially ordered activities to more accurately reflect process behavior observed in reality, such as simultaneous execution of activities. Process executions comprising partially ordered activities may contain more complex activity patterns than sequence-based process executions. This paper presents a novel query language to call up process executions from event logs containing partially ordered activities. The query language allows users to specify complex ordering relations over activities, i.e., control flow constraints. Evaluating a query for a given log returns process executions satisfying the specified constraints. We demonstrate the implementation of the query language in a process mining tool and evaluate its performance on real-life event logs.
△ Less
Submitted 4 January, 2023; v1 submitted 8 November, 2022;
originally announced November 2022.
-
High-Level Event Mining: A Framework
Authors:
Bianka Bakullari,
Wil M. P. van der Aalst
Abstract:
Process mining methods often analyze processes in terms of the individual end-to-end process runs. Process behavior, however, may materialize as a general state of many involved process components, which can not be captured by looking at the individual process instances. A more holistic state of the process can be determined by looking at the events that occur close in time and share common proces…
▽ More
Process mining methods often analyze processes in terms of the individual end-to-end process runs. Process behavior, however, may materialize as a general state of many involved process components, which can not be captured by looking at the individual process instances. A more holistic state of the process can be determined by looking at the events that occur close in time and share common process capacities. In this work, we conceptualize such behavior using high-level events and propose a new framework for detecting and logging such high-level events. The output of our method is a new high-level event log, which collects all generated high-level events together with the newly assigned event attributes: activity, case, and timestamp. Existing process mining techniques can then be applied on the produced high-level event log to obtain further insights. Experiments on both simulated and real-life event data show that our method is able to automatically discover how system-level patterns such as high traffic and workload emerge, propagate and dissolve throughout the process.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
Explainable Predictive Decision Mining for Operational Support
Authors:
Gyunam Park,
Aaron Küsters,
Mara Tews,
Cameron Pitsch,
Jonathan Schneider,
Wil M. P. van der Aalst
Abstract:
Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than $500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of…
▽ More
Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than $500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of the process. By predicting the decision, one can take proactive actions to improve the process. For instance, when a bottleneck is developing in one of the possible decisions, one can predict the decision and bypass the bottleneck. However, despite its huge potential for such operational support, existing techniques for decision mining have focused largely on describing decisions but not on predicting them, deploying decision trees to produce logical expressions to explain the decision. In this work, we aim to enhance the predictive capability of decision mining to enable proactive operational support by deploying more advanced machine learning algorithms. Our proposed approach provides explanations of the predicted decisions using SHAP values to support the elicitation of proactive actions. We have implemented a Web application to support the proposed approach and evaluated the approach using the implementation.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
TraVaS: Differentially Private Trace Variant Selection for Process Mining
Authors:
Majid Rafiei,
Frederik Wangelik,
Wil M. P. van der Aalst
Abstract:
In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical li…
▽ More
In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical limitations such as high computational complexity, introducing fake variants, removing frequent variants, and a bounded variant length. In this paper, we introduce a new approach for direct differentially private trace variant release which uses anonymized \textit{partition selection} strategies to overcome the aforementioned restraints. Experimental results on real-life event data show that our algorithm outperforms state-of-the-art methods in terms of both plain data utility and result utility preservation.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Monitoring Constraints in Business Processes Using Object-Centric Constraint Graphs
Authors:
Gyunam Park,
Wil. M. P. van der Aalst
Abstract:
Constraint monitoring aims to monitor the violation of constraints in business processes, e.g., an invoice should be cleared within 48 hours after the corresponding goods receipt, by analyzing event data. Existing techniques for constraint monitoring assume that a single case notion exists in a business process, e.g., a patient in a healthcare process, and each event is associated with the case no…
▽ More
Constraint monitoring aims to monitor the violation of constraints in business processes, e.g., an invoice should be cleared within 48 hours after the corresponding goods receipt, by analyzing event data. Existing techniques for constraint monitoring assume that a single case notion exists in a business process, e.g., a patient in a healthcare process, and each event is associated with the case notion. However, in reality, business processes are object-centric, i.e., multiple case notions (objects) exist, and an event may be associated with multiple objects. For instance, an Order-To-Cash (O2C) process involves order, item, delivery, etc., and they interact when executing an event, e.g., packing multiple items together for a delivery. The existing techniques produce misleading insights when applied to such object-centric business processes. In this work, we propose an approach to monitoring constraints in object-centric business processes. To this end, we introduce Object-Centric Constraint Graphs (OCCGs) to represent constraints that consider the interaction of objects. Next, we evaluate the constraints represented by OCCGs by analyzing Object-Centric Event Logs (OCELs) that store the interaction of different objects in events. We have implemented a web application to support the proposed approach and conducted two case studies using a real-life SAP ERP system.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Process Modeling and Conformance Checking in Healthcare: A COVID-19 Case Study
Authors:
Elisabetta Benevento,
Marco Pegoraro,
Mattia Antoniazzi,
Harry H. Beyel,
Viki Peeva,
Paul Balfanz,
Wil M. P. van der Aalst,
Lukas Martin,
Gernot Marx
Abstract:
The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: developing a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and a…
▽ More
The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: developing a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and analyzing the adherence of the observed behavior (recorded in the information system of the hospital) to such guidelines. We show that, through conformance checking techniques, it is possible to analyze the care process for COVID-19 patients, highlighting the main deviations from the clinical guidelines. The results provide physicians with useful indications for improving the process and ensuring service quality and patient satisfaction. We share the resulting model as an open-source BPMN file.
△ Less
Submitted 23 November, 2022; v1 submitted 22 September, 2022;
originally announced September 2022.
-
Conformance Checking for Trace Fragments Using Infix and Postfix Alignments
Authors:
Daniel Schuster,
Niklas Föcking,
Sebastiaan J. van Zelst,
Wil M. P. van der Aalst
Abstract:
Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefix…
▽ More
Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefixes of process executions. This paper defines infix/postfix alignments, proposes approaches to their computation, and evaluates them using real-life event data.
△ Less
Submitted 15 August, 2022;
originally announced September 2022.
-
A Framework for Extracting and Encoding Features from Object-Centric Event Data
Authors:
Jan Niklas Adams,
Gyunam Park,
Sergej Levich,
Daniel Schuster,
Wil M. P. van der Aalst
Abstract:
Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process. Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. As traditional process mining techniques assume events associated with exactly one object, these techniq…
▽ More
Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process. Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. As traditional process mining techniques assume events associated with exactly one object, these techniques cannot be applied to object-centric event data. To use traditional process mining techniques, the object-centric event data are flattened by removing all object references but one. The flattening process is lossy, leading to inaccurate features extracted from flattened data. Furthermore, the graph-like structure of object-centric event data is lost when flattening. In this paper, we introduce a general framework for extracting and encoding features from object-centric event data. We calculate features natively on the object-centric event data, leading to accurate measures. Furthermore, we provide three encodings for these features: tabular, sequential, and graph-based. While tabular and sequential encodings have been heavily used in process mining, the graph-based encoding is a new technique preserving the structure of the object-centric event data. We provide six use cases: a visualization and a prediction use case for each of the three encodings. We use explainable AI in the prediction use cases to show the utility of both the object-centric features and the structure of the sequential and graph-based encoding for a predictive model.
△ Less
Submitted 2 September, 2022;
originally announced September 2022.
-
Detecting Surprising Situations in Event Data
Authors:
Christian Kohlschmidt,
Mahnaz Sadat Qafari,
Wil M. P. van der Aalst
Abstract:
Process mining is a set of techniques that are used by organizations to understand and improve their operational processes. The first essential step in designing any process reengineering procedure is to find process improvement opportunities. In existing work, it is usually assumed that the set of problematic process instances in which an undesirable outcome occurs is known prior or is easily det…
▽ More
Process mining is a set of techniques that are used by organizations to understand and improve their operational processes. The first essential step in designing any process reengineering procedure is to find process improvement opportunities. In existing work, it is usually assumed that the set of problematic process instances in which an undesirable outcome occurs is known prior or is easily detectable. So the process enhancement procedure involves finding the root causes and the treatments for the problem in those process instances. For example, the set of problematic instances is considered as those with outlier values or with values smaller/bigger than a given threshold in one of the process features. However, on various occasions, using this approach, many process enhancement opportunities, not captured by these problematic process instances, are missed. To overcome this issue, we formulate finding the process enhancement areas as a context-sensitive anomaly/outlier detection problem. We define a process enhancement area as a set of situations (process instances or prefixes of process instances) where the process performance is surprising. We aim to characterize those situations where process performance/outcome is significantly different from what was expected considering its performance/outcome in similar situations. To evaluate the validity and relevance of the proposed approach, we have implemented and evaluated it on several real-life event logs.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
Defining Cases and Variants for Object-Centric Event Data
Authors:
Jan Niklas Adams,
Daniel Schuster,
Seth Schmitz,
Günther Schuh,
Wil M. P. van der Aalst
Abstract:
The execution of processes leaves traces of event data in information systems. These event data can be analyzed through process mining techniques. For traditional process mining techniques, one has to associate each event with exactly one object, e.g., the company's customer. Events related to one object form an event sequence called a case. A case describes an end-to-end run through a process. Th…
▽ More
The execution of processes leaves traces of event data in information systems. These event data can be analyzed through process mining techniques. For traditional process mining techniques, one has to associate each event with exactly one object, e.g., the company's customer. Events related to one object form an event sequence called a case. A case describes an end-to-end run through a process. The cases contained in event data can be used to discover a process model, detect frequent bottlenecks, or learn predictive models. However, events encountered in real-life information systems, e.g., ERP systems, can often be associated with multiple objects. The traditional sequential case concept falls short of these object-centric event data as these data exhibit a graph structure. One might force object-centric event data into the traditional case concept by flattening it. However, flattening manipulates the data and removes information. Therefore, a concept analogous to the case concept of traditional event logs is necessary to enable the application of different process mining tasks on object-centric event data. In this paper, we introduce the case concept for object-centric process mining: process executions. These are graph-based generalizations of cases as considered in traditional process mining. Furthermore, we provide techniques to extract process executions. Based on these executions, we determine equivalent process behavior with respect to an attribute using graph isomorphism. Equivalent process executions with respect to the event's activity are object-centric variants, i.e., a generalization of variants in traditional process mining. We provide a visualization technique for object-centric variants. The contribution's scalability and efficiency are extensively evaluated. Furthermore, we provide a case study showing the most frequent object-centric variants of a real-life event log.
△ Less
Submitted 5 August, 2022;
originally announced August 2022.
-
Quantifying Temporal Privacy Leakage in Continuous Event Data Publishing
Authors:
Majid Rafiei,
Gamal Elkoumy,
Wil M. P. van der Aalst
Abstract:
Process mining employs event data extracted from different types of information systems to discover and analyze actual processes. Event data often contain highly sensitive information about the people who carry out activities or the people for whom activities are performed. Therefore, privacy concerns in process mining are receiving increasing attention. To alleviate privacy-related risks, several…
▽ More
Process mining employs event data extracted from different types of information systems to discover and analyze actual processes. Event data often contain highly sensitive information about the people who carry out activities or the people for whom activities are performed. Therefore, privacy concerns in process mining are receiving increasing attention. To alleviate privacy-related risks, several privacy preservation techniques have been proposed. Differential privacy is one of these techniques which provides strong privacy guarantees. However, the proposed techniques presume that event data are released in only one shot, whereas business processes are continuously executed. Hence, event data are published repeatedly, resulting in additional risks. In this paper, we demonstrate that continuously released event data are not independent, and the correlation among different releases can result in privacy degradation when the same differential privacy mechanism is applied to each release. We quantify such privacy degradation in the form of temporal privacy leakages. We apply continuous event data publishing scenarios to real-life event logs to demonstrate privacy leakages.
△ Less
Submitted 29 September, 2022; v1 submitted 3 August, 2022;
originally announced August 2022.
-
Clustering Object-Centric Event Logs
Authors:
Anahita Farhang Ghahfarokhi,
Fatemeh Akoochekian,
Fareed Zandkarimi,
Wil M. P. van der Aalst
Abstract:
Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event lo…
▽ More
Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event logs with a single case notion) to reduce the complexity of process models and discover homogeneous subsets of cases. Nevertheless, in real-life processes, particularly in the context of Business-to-Business (B2B) processes, multiple objects are involved in a process. Recently, Object-Centric Event Logs (OCELs) have been introduced to capture the information of such processes, and several process discovery techniques have been developed on top of OCELs. Yet, the output of the proposed discovery techniques on real OCELs leads to more informative but also more complex models. In this paper, we propose a clustering-based approach to cluster similar objects in OCELs to simplify the obtained process models. Using a case study of a real B2B process, we demonstrate that our approach reduces the complexity of the process models and generates coherent subsets of objects which help the end-users gain insights into the process.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Predictive Object-Centric Process Monitoring
Authors:
Timo Rohrer,
Anahita Farhang Ghahfarokhi,
Mohamed Behery,
Gerhard Lakemeyer,
Wil M. P. van der Aalst
Abstract:
The automation and digitalization of business processes has resulted in large amounts of data captured in information systems, which can aid businesses in understanding their processes better, improve workflows, or provide operational support. By making predictions about ongoing processes, bottlenecks can be identified and resources reallocated, as well as insights gained into the state of a proce…
▽ More
The automation and digitalization of business processes has resulted in large amounts of data captured in information systems, which can aid businesses in understanding their processes better, improve workflows, or provide operational support. By making predictions about ongoing processes, bottlenecks can be identified and resources reallocated, as well as insights gained into the state of a process instance (case). Traditionally, data is extracted from systems in the form of an event log with a single identifying case notion, such as an order id for an Order to Cash (O2C) process. However, real processes often have multiple object types, for example, order, item, and package, so a format that forces the use of a single case notion does not reflect the underlying relations in the data. The Object-Centric Event Log (OCEL) format was introduced to correctly capture this information. The state-of-the-art predictive methods have been tailored to only traditional event logs. This thesis shows that a prediction method utilizing Generative Adversarial Networks (GAN), Long Short-Term Memory (LSTM) architectures, and Sequence to Sequence models (Seq2seq), can be augmented with the rich data contained in OCEL. Objects in OCEL can have attributes that are useful in predicting the next event and timestamp, such as a priority class attribute for an object type package indicating slower or faster processing. In the metrics of sequence similarity of predicted remaining events and mean absolute error (MAE) of the timestamp, the approach in this thesis matches or exceeds previous research, depending on whether selected object attributes are useful features for the model. Additionally, this thesis provides a web interface to predict the next sequence of activities from user input.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Detecting Context-Aware Deviations in Process Executions
Authors:
Gyunam Park,
Janik-Vasily Benzin,
Wil M. P. van der Aalst
Abstract:
A deviation detection aims to detect deviating process instances, e.g., patients in the healthcare process and products in the manufacturing process. A business process of an organization is executed in various contextual situations, e.g., a COVID-19 pandemic in the case of hospitals and a lack of semiconductor chip shortage in the case of automobile companies. Thus, context-aware deviation detect…
▽ More
A deviation detection aims to detect deviating process instances, e.g., patients in the healthcare process and products in the manufacturing process. A business process of an organization is executed in various contextual situations, e.g., a COVID-19 pandemic in the case of hospitals and a lack of semiconductor chip shortage in the case of automobile companies. Thus, context-aware deviation detection is essential to provide relevant insights. However, existing work 1) does not provide a systematic way of incorporating various contexts, 2) is tailored to a specific approach without using an extensive pool of existing deviation detection techniques, and 3) does not distinguish positive and negative contexts that justify and refute deviation, respectively. In this work, we provide a framework to bridge the aforementioned gaps. We have implemented the proposed framework as a web service that can be extended to various contexts and deviation detection methods. We have evaluated the effectiveness of the proposed framework by conducting experiments using 255 different contextual scenarios.
△ Less
Submitted 11 June, 2022;
originally announced June 2022.
-
OPerA: Object-Centric Performance Analysis
Authors:
Gyunam Park,
Jan Niklas Adams,
Wil. M. P. van der Aalst
Abstract:
Performance analysis in process mining aims to provide insights on the performance of a business process by using a process model as a formal representation of the process. Such insights are reliably interpreted by process analysts in the context of a model with formal semantics. Existing techniques for performance analysis assume that a single case notion exists in a business process (e.g., a pat…
▽ More
Performance analysis in process mining aims to provide insights on the performance of a business process by using a process model as a formal representation of the process. Such insights are reliably interpreted by process analysts in the context of a model with formal semantics. Existing techniques for performance analysis assume that a single case notion exists in a business process (e.g., a patient in healthcare process). However, in reality, different objects might interact (e.g., order, item, delivery, and invoice in an O2C process). In such a setting, traditional techniques may yield misleading or even incorrect insights on performance metrics such as waiting time. More importantly, by considering the interaction between objects, we can define object-centric performance metrics such as synchronization time, pooling time, and lagging time. In this work, we propose a novel approach to performance analysis considering multiple case notions by using object-centric Petri nets as formal representations of business processes. The proposed approach correctly computes existing performance metrics, while supporting the derivation of newly-introduced object-centric performance metrics. We have implemented the approach as a web application and conducted a case study based on a real-life loan application process.
△ Less
Submitted 27 June, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining
Authors:
Alessandro Berti,
Minh Phan Nghia,
Wil M. P. van der Aalst
Abstract:
Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to analyze large amounts of data. This paper presents PM4Py-GPU, a Python process mining library based on the NVIDIA RAPIDS framework. Thanks to the dataframe colum…
▽ More
Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to analyze large amounts of data. This paper presents PM4Py-GPU, a Python process mining library based on the NVIDIA RAPIDS framework. Thanks to the dataframe columnar storage and the high level of parallelism, a significant speed-up is achieved on classic process mining computations and processing activities.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Uncertain Case Identifiers in Process Mining: A User Study of the Event-Case Correlation Problem on Click Data
Authors:
Marco Pegoraro,
Merih Seran Uysal,
Tom-Hendrik Hülsmann,
Wil M. P. van der Aalst
Abstract:
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for e…
▽ More
Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions-interpreted as cases-based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
An XES Extension for Uncertain Event Data
Authors:
Marco Pegoraro,
Merih Seran Uysal,
Wil M. P. van der Aalst
Abstract:
Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty-imprecisions quantified with meta-information…
▽ More
Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty-imprecisions quantified with meta-information in the event log. This paper illustrates an extension of the XES data standard capable of representing uncertain event data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis through the process discovery and conformance checking approaches available in literature.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Event Log Sampling for Predictive Monitoring
Authors:
Mohammadreza Fani Sani,
Mozhgan Vazifehdoostirani,
Gyunam Park,
Marco Pegoraro,
Sebastiaan J. van Zelst,
Wil M. P. van der Aalst
Abstract:
Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection p…
▽ More
Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection procedure that allows sampling training process instances for prediction models. We show that our sampling method allows for a significant increase of training speed for next activity prediction methods while maintaining reliable levels of prediction accuracy.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
A Web-Based Tool for Comparative Process Mining
Authors:
Madhavi Bangalore Shankara Narayana,
Elisabetta Benevento,
Marco Pegoraro,
Muhammad Abdullah,
Rahim Bin Shahid,
Qasim Sajid,
Muhammad Usman Mansoor,
Wil M. P. van der Aalst
Abstract:
Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We…
▽ More
Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We describe the architecture, structure and use of the tool, and we provide an open-source full implementation.
△ Less
Submitted 4 April, 2022; v1 submitted 1 April, 2022;
originally announced April 2022.
-
Analyzing Process-Aware Information System Updates Using Digital Twins of Organizations
Authors:
Gyunam Park,
Marco Comuzzi,
Wil M. P. van der Aalst
Abstract:
Digital transformation often entails small-scale changes to information systems supporting the execution of business processes. These changes may increase the operational frictions in process execution, which decreases the process performance. The contributions in the literature providing support to the tracking and impact analysis of small-scale changes are limited in scope and functionality. In…
▽ More
Digital transformation often entails small-scale changes to information systems supporting the execution of business processes. These changes may increase the operational frictions in process execution, which decreases the process performance. The contributions in the literature providing support to the tracking and impact analysis of small-scale changes are limited in scope and functionality. In this paper, we use the recently developed Digital Twins of Organizations (DTOs) to assess the impact of (process-aware) information systems updates. More in detail, we model the updates using the configuration of DTOs and quantitatively assess different types of impacts of information system updates (structural, operational, and performance-related). We implemented a prototype of the proposed approach. Moreover, we discuss a case study involving a standard ERP procure-to-pay business process.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
How to Write Beautiful Process-and-Data-Science Papers?
Authors:
Wil M. P. van der Aalst
Abstract:
After 25 years of PhD supervision, the author noted typical recurring problems that make papers look sloppy, difficult to read, and incoherent. The goal is not to write a paper for the sake of writing a paper, but to convey a valuable message that is clear and precise. The goal is to write papers that have an impact and are still understandable a couple of decades later. Our mission should be to c…
▽ More
After 25 years of PhD supervision, the author noted typical recurring problems that make papers look sloppy, difficult to read, and incoherent. The goal is not to write a paper for the sake of writing a paper, but to convey a valuable message that is clear and precise. The goal is to write papers that have an impact and are still understandable a couple of decades later. Our mission should be to create papers of high quality that people want to read and that can stand the test of time. We use Dijkstra's adagium "Beauty Is Our Business" to stress the importance of simplicity, correctness, and cleanness.
△ Less
Submitted 4 July, 2024; v1 submitted 17 March, 2022;
originally announced March 2022.
-
A Python Tool for Object-Centric Process Mining Comparison
Authors:
Anahita Farhang Ghahfarokhi,
Wil M. P. van der Aalst
Abstract:
Object-centric process mining provides a more holistic view of processes where we analyze processes with multiple case notions. However, most object-centric process mining techniques consider the whole event log rather than the comparison of existing behaviors in the log. In this paper, we introduce a stand-alone object-centric process cube tool built on the PM4PY-MDL process mining framework. Our…
▽ More
Object-centric process mining provides a more holistic view of processes where we analyze processes with multiple case notions. However, most object-centric process mining techniques consider the whole event log rather than the comparison of existing behaviors in the log. In this paper, we introduce a stand-alone object-centric process cube tool built on the PM4PY-MDL process mining framework. Our infrastructure uses both object and event attributes to build the process cube which leads to different types of materialization. Furthermore, our tool is equipped with the state of the art object-centric process mining techniques. Through our tool the user can visualize the extracted object-centric event log from process cube operations, export the object-centric event log, discover the state-of-the-art object-centric process model for the extracted log, and compare the process models side-by-side.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
A Scalable Database for the Storage of Object-Centric Event Logs
Authors:
Alessandro Berti,
Anahita Farhang Ghahfarokhi,
Gyunam Park,
Wil M. P. van der Aalst
Abstract:
Object-centric process mining provides a set of techniques for the analysis of event data where events are associated to several objects. To store Object-centric Event Logs (OCELs), the JSON-OCEL and JSON-XML formats have been recently proposed. However, the proposed implementations of the OCEL are file-based. This means that the entire file needs to be parsed in order to apply process mining tech…
▽ More
Object-centric process mining provides a set of techniques for the analysis of event data where events are associated to several objects. To store Object-centric Event Logs (OCELs), the JSON-OCEL and JSON-XML formats have been recently proposed. However, the proposed implementations of the OCEL are file-based. This means that the entire file needs to be parsed in order to apply process mining techniques, such as the discovery of object-centric process models. In this paper, we propose a database storage for the OCEL format using the MongoDB document database. Since documents in MongoDB are equivalent to JSON objects, the current JSON implementation of the standard could be translated straightforwardly in a series of MongoDB collections.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Analyzing Medical Data with Process Mining: a COVID-19 Case Study
Authors:
Marco Pegoraro,
Madhavi Bangalore Shankara Narayana,
Elisabetta Benevento,
Wil M. P. van der Aalst,
Lukas Martin,
Gernot Marx
Abstract:
The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of even…
▽ More
The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of events, effectively allowing to perform process mining analyses. In this paper, we will present some preliminary findings obtained with established process mining techniques in regard of the medical data of patients of the Uniklinik Aachen hospital affected by the recent epidemic of COVID-19. We show that process mining techniques are able to reconstruct a model of the ICU treatments for COVID patients.
△ Less
Submitted 25 March, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Interactive Process Improvement using Simulation of Enriched Process Trees
Authors:
Mahsa Pourbafrani,
Wil M. P. van der Aalst
Abstract:
Event data provide the main source of information for analyzing and improving processes in organizations. Process mining techniques capture the state of running processes w.r.t. various aspects, such as activity-flow and performance metrics. The next step for process owners is to take the provided insights and turn them into actions in order to improve their processes. These actions may be taken i…
▽ More
Event data provide the main source of information for analyzing and improving processes in organizations. Process mining techniques capture the state of running processes w.r.t. various aspects, such as activity-flow and performance metrics. The next step for process owners is to take the provided insights and turn them into actions in order to improve their processes. These actions may be taken in different aspects of a process. However, simply being aware of the process aspects that need to be improved as well as potential actions is insufficient. The key step in between is to assess the outcomes of the decisions and improvements. In this paper, we propose a framework to systematically compare event data and the simulated event data of organizations, as well as comparing the results of modified processes in different settings. The proposed framework could be provided as an analytic service to enable organizations in easily accessing event data analytics. The framework is supported with a simulation tool that enables applying changes to the processes and re-running the process in various scenarios. The simulation step includes different perspectives of a process that can be captured automatically and modified by the user. Then, we apply a state-of-the-art comparison approach for processes using their event data which visually reflects the effects of these changes in the process, i.e., evaluating the process improvement. Our framework also includes the implementation of the change measurement module as a tool.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.