subscribe to arXiv mailings

Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge

Authors: Alessandro Berti, Urszula Jessen, Wil M. P. van der Aalst, Dirk Fahland

Abstract: Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interac… ▽ More Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interactions between the different objects. Therefore, anomalous patterns are proposed to the user without requiring the definition of object-centric process models. This paper proposes different methodologies for object-centric anomaly detection and discusses the role of domain knowledge for these methodologies. We discuss the advantages and limitations of Large Language Models (LLMs) in the provision of such domain knowledge. Following our experience in a real-life P2P process, we also discuss the role of algorithms (dimensionality reduction+anomaly detection), suggest some pre-processing steps, and discuss the role of feature propagation. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2405.14435 [pdf, other]

High-Level Event Mining: Overview and Future Work

Authors: Bianka Bakullari, Wil M. P. van der Aalst

Abstract: Process mining traditionally relies on input consisting of low-level events that capture individual activities, such as filling out a form or processing a product. However, many of the complex problems inherent in processes, such as bottlenecks and compliance issues, extend beyond the scope of individual events and process instances. Consider congestion, for instance, it can involve and impact num… ▽ More Process mining traditionally relies on input consisting of low-level events that capture individual activities, such as filling out a form or processing a product. However, many of the complex problems inherent in processes, such as bottlenecks and compliance issues, extend beyond the scope of individual events and process instances. Consider congestion, for instance, it can involve and impact numerous cases, much like how a traffic jam affects many cars simultaneously. High-level event mining seeks to address such phenomena using the regular event data available. This report offers an extensive and comprehensive overview at existing work and challenges encountered when lifting the perspective from individual events and cases to system-level events. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2403.10544 [pdf, other]

doi 10.5220/0012392600003657

Process-Aware Analysis of Treatment Paths in Heart Failure Patients: A Case Study

Authors: Harry H. Beyel, Marlo Verket, Viki Peeva, Christian Rennert, Marco Pegoraro, Katharina Schütt, Wil M. P. van der Aalst, Nikolaus Marx

Abstract: Process mining in healthcare presents a range of challenges when working with different types of data within the healthcare domain. There is high diversity considering the variety of data collected from healthcare processes: operational processes given by claims data, a collection of events during surgery, data related to pre-operative and post-operative care, and high-level data collections based… ▽ More Process mining in healthcare presents a range of challenges when working with different types of data within the healthcare domain. There is high diversity considering the variety of data collected from healthcare processes: operational processes given by claims data, a collection of events during surgery, data related to pre-operative and post-operative care, and high-level data collections based on regular ambulant visits with no apparent events. In this case study, a data set from the last category is analyzed. We apply process-mining techniques on sparse patient heart failure data and investigate whether an information gain towards several research questions is achievable. Here, available data are transformed into an event log format, and process discovery and conformance checking are applied. Additionally, patients are split into different cohorts based on comorbidities, such as diabetes and chronic kidney disease, and multiple statistics are compared between the cohorts. Conclusively, we apply decision mining to determine whether a patient will have a cardiovascular outcome and whether a patient will die. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 10 pages, 3 figures, 9 tables, 31 references

arXiv:2403.07541 [pdf, other]

doi 10.1007/978-3-031-61007-3_18

Process Modeling With Large Language Models

Authors: Humam Kourani, Alessandro Berti, Daniel Schuster, Wil M. P. van der Aalst

Abstract: In the realm of Business Process Management (BPM), process modeling plays a crucial role in translating complex process dynamics into comprehensible visual representations, facilitating the understanding, analysis, improvement, and automation of organizational processes. Traditional process modeling methods often require extensive expertise and can be time-consuming. This paper explores the integr… ▽ More In the realm of Business Process Management (BPM), process modeling plays a crucial role in translating complex process dynamics into comprehensible visual representations, facilitating the understanding, analysis, improvement, and automation of organizational processes. Traditional process modeling methods often require extensive expertise and can be time-consuming. This paper explores the integration of Large Language Models (LLMs) into process modeling to enhance the accessibility of process modeling, offering a more intuitive entry point for non-experts while augmenting the efficiency of experts. We propose a framework that leverages LLMs for the automated generation and iterative refinement of process models starting from textual descriptions. Our framework involves innovative prompting strategies for effective LLM utilization, along with a secure model generation protocol and an error-handling mechanism. Moreover, we instantiate a concrete system extending our framework. This system provides robust quality guarantees on the models generated and supports exporting them in standard modeling notations, such as the Business Process Modeling Notation (BPMN) and Petri nets. Preliminary results demonstrate the framework's ability to streamline process modeling tasks, underscoring the transformative potential of generative AI in the BPM field. △ Less

Submitted 8 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.04327 [pdf, other]

ProMoAI: Process Modeling with Generative AI

Authors: Humam Kourani, Alessandro Berti, Daniel Schuster, Wil M. P. van der Aalst

Abstract: ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the g… ▽ More ProMoAI is a novel tool that leverages Large Language Models (LLMs) to automatically generate process models from textual descriptions, incorporating advanced prompt engineering, error handling, and code generation techniques. Beyond automating the generation of complex process models, ProMoAI also supports process model optimization. Users can interact with the tool by providing feedback on the generated model, which is then used for refining the process model. ProMoAI utilizes the capabilities LLMs to offer a novel, AI-driven approach to process modeling, significantly reducing the barrier to entry for users without deep technical knowledge in process modeling. △ Less

Submitted 29 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.01975 [pdf, other]

OCEL (Object-Centric Event Log) 2.0 Specification

Authors: Alessandro Berti, Istvan Koren, Jan Niklas Adams, Gyunam Park, Benedikt Knopp, Nina Graves, Majid Rafiei, Lukas Liß, Leah Tacke Genannt Unterberg, Yisong Zhang, Christopher Schwanen, Marco Pegoraro, Wil M. P. van der Aalst

Abstract: Object-Centric Event Logs (OCELs) form the basis for Object-Centric Process Mining (OCPM). OCEL 1.0 was first released in 2020 and triggered the development of a range of OCPM techniques. OCEL 2.0 forms the new, more expressive standard, allowing for more extensive process analyses while remaining in an easily exchangeable format. In contrast to the first OCEL standard, it can depict changes in ob… ▽ More Object-Centric Event Logs (OCELs) form the basis for Object-Centric Process Mining (OCPM). OCEL 1.0 was first released in 2020 and triggered the development of a range of OCPM techniques. OCEL 2.0 forms the new, more expressive standard, allowing for more extensive process analyses while remaining in an easily exchangeable format. In contrast to the first OCEL standard, it can depict changes in objects, provide information on object relationships, and qualify these relationships to other objects or specific events. Compared to XES, it is more expressive, less complicated, and better readable. OCEL 2.0 offers three exchange formats: a relational database (SQLite), XML, and JSON format. This OCEL 2.0 specification document provides an introduction to the standard, its metamodel, and its exchange formats, aimed at practitioners and researchers alike. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.14149 [pdf, other]

Developing a High-Performance Process Mining Library with Java and Python Bindings in Rust

Authors: Aaron Küsters, Wil M. P. van der Aalst

Abstract: The most commonly used open-source process mining software tools today are ProM and PM4Py, written in Java and Python, respectively. Such high-level, often interpreted, programming languages trade off performance with memory safety and ease-of-use. In contrast, traditional compiled languages, like C or C++, can achieve top performance but often suffer from instability related to unsafe memory mana… ▽ More The most commonly used open-source process mining software tools today are ProM and PM4Py, written in Java and Python, respectively. Such high-level, often interpreted, programming languages trade off performance with memory safety and ease-of-use. In contrast, traditional compiled languages, like C or C++, can achieve top performance but often suffer from instability related to unsafe memory management. Lately, Rust emerged as a highly performant, compiled programming language with inherent memory safety. In this paper, we describe our approach to developing a shared process mining library in Rust with bindings to both Java and Python, allowing full integration into the existing ecosystems, like ProM and PM4Py. By facilitating interoperability, our methodology enables researchers or industry to develop novel algorithms in Rust once and make them accessible to the entire community while also achieving superior performance. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 22 pages, 6 figures, 7 tables

arXiv:2311.08795 [pdf, other]

Advancements and Challenges in Object-Centric Process Mining: A Systematic Literature Review

Authors: Alessandro Berti, Marco Montali, Wil M. P. van der Aalst

Abstract: Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process min… ▽ More Recent years have seen the emergence of object-centric process mining techniques. Born as a response to the limitations of traditional process mining in analyzing event data from prevalent information systems like CRM and ERP, these techniques aim to tackle the deficiency, convergence, and divergence issues seen in traditional event logs. Despite the promise, the adoption in real-world process mining analyses remains limited. This paper embarks on a comprehensive literature review of object-centric process mining, providing insights into the current status of the discipline and its historical trajectory. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.03040 [pdf, other]

Grouping Local Process Models

Authors: Viki Peeva, Wil M. P. van der Aalst

Abstract: In recent years, process mining emerged as a proven technology to analyze and improve operational processes. An expanding range of organizations using process mining in their daily operation brings a broader spectrum of processes to be analyzed. Some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describ… ▽ More In recent years, process mining emerged as a proven technology to analyze and improve operational processes. An expanding range of organizations using process mining in their daily operation brings a broader spectrum of processes to be analyzed. Some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describing the entire process. Therefore, the subdiscipline of Local Process Model (LPM) discovery tries to build a set of LPMs, i.e., smaller models that explain sub-behaviors of the process. However, like other pattern mining approaches, LPM discovery algorithms also face the problems of model explosion and model repetition, i.e., the algorithms may create hundreds if not thousands of models, and subsets of them are close in structure or behavior. This work proposes a three-step pipeline for grouping similar LPMs using various process model similarity measures. We demonstrate the usefulness of grouping through a real-life case study, and analyze the impact of different measures, the gravity of repetition in the discovered LPMs, and how it improves after grouping on multiple real event logs. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 12 pages, 5 figures

ACM Class: I.5.3

arXiv:2310.11332 [pdf, other]

Discovering High-Quality Process Models Despite Data Scarcity

Authors: Jan Niklas Adams, Jari Peeperkorn, Tobias Brockhoff, Isabelle Terrier, Heiko Göhner, Merih Seran Uysal, Seppe vanden Broucke, Jochen De Weerdt, Wil M. P. van der Aalst

Abstract: Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for processes with concurrent subprocesses such as hierarchical, composite, or distributed processes. While process discovery algorithms traditionally use sequence… ▽ More Process discovery algorithms learn process models from executed activity sequences, describing concurrency, causality, and conflict. Concurrent activities require observing multiple permutations, increasing data requirements, especially for processes with concurrent subprocesses such as hierarchical, composite, or distributed processes. While process discovery algorithms traditionally use sequences of activities as input, recently introduced object-centric process discovery algorithms can use graphs of activities as input, encoding partial orders between activities. As such, they contain the concurrency information of many sequences in a single graph. In this paper, we address the research question of reducing process discovery data requirements when using object-centric event logs for process discovery. We classify different real-life processes according to the control-flow complexity within and between subprocesses and introduce an evaluation framework to assess process discovery algorithm quality of traditional and object-centric process discovery based on the sample size. We complement this with a large-scale production process case study. Our results show reduced data requirements, enabling the discovery of large, concurrent processes such as manufacturing with little data, previously infeasible with traditional process discovery. Our findings suggest that object-centric process mining could revolutionize process discovery in various sectors, including manufacturing and supply chains. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10174 [pdf, other]

Analyzing An After-Sales Service Process Using Object-Centric Process Mining: A Case Study

Authors: Gyunam Park, Sevde Aydin, Cuneyt Ugur, Wil M. P. van der Aalst

Abstract: Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexpl… ▽ More Process mining, a technique turning event data into business process insights, has traditionally operated on the assumption that each event corresponds to a singular case or object. However, many real-world processes are intertwined with multiple objects, making them object-centric. This paper focuses on the emerging domain of object-centric process mining, highlighting its potential yet underexplored benefits in actual operational scenarios. Through an in-depth case study of Borusan Cat's after-sales service process, this study emphasizes the capability of object-centric process mining to capture entangled business process details. Utilizing an event log of approximately 65,000 events, our analysis underscores the importance of embracing this paradigm for richer business insights and enhanced operational improvements. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.02735 [pdf, other]

Extracting Rules from Event Data for Study Planning

Authors: Majid Rafiei, Duygu Bayrak, Mahsa Pourbafrani, Gyunam Park, Hayyan Helal, Gerhard Lakemeyer, Wil M. P. van der Aalst

Abstract: In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recomm… ▽ More In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recommendations in the form of rules for study planning and compare them to the recommended study plan. The evaluation focuses on RWTH Aachen University computer science bachelor program students and demonstrates that the proposed course sequence features effectively explain academic performance measures. Furthermore, the findings suggest avenues for developing more adaptable study plans. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.01571 [pdf, other]

The Interplay Between High-Level Problems and The Process Instances That Give Rise To Them

Authors: Bianka Bakullari, Jules van Thoor, Dirk Fahland, Wil M. P. van der Aalst

Abstract: Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high… ▽ More Business processes may face a variety of problems due to the number of tasks that need to be handled within short time periods, resources' workload and working patterns, as well as bottlenecks. These problems may arise locally and be short-lived, but as the process is forced to operate outside its standard capacity, the effect on the underlying process instances can be costly. We use the term high-level behavior to cover all process behavior which can not be captured in terms of the individual process instances. %Whenever such behavior emerges, we call the cases which are involved in it participating cases. The natural question arises as to how the characteristics of cases relate to the high-level behavior they give rise to. In this work, we first show how to detect and correlate observations of high-level problems, as well as determine the corresponding (non-)participating cases. Then we show how to assess the connection between any case-level characteristic and any given detected sequence of high-level problems. Applying our method on the event data of a real loan application process revealed which specific combinations of delays, batching and busy resources at which particular parts of the process correlate with an application's duration and chance of a positive outcome. △ Less

Submitted 4 September, 2023; originally announced September 2023.

arXiv:2307.02833 [pdf, other]

Applying Process Mining on Scientific Workflows: a Case Study

Authors: Zahra Sadeghibogar, Alessandro Berti, Marco Pegoraro, Wil M. P. van der Aalst

Abstract: Computer-based scientific experiments are becoming increasingly data-intensive. High-Performance Computing (HPC) clusters are ideal for executing large scientific experiment workflows. Executing large scientific workflows in an HPC cluster leads to complex flows of data and control within the system, which are difficult to analyze. This paper presents a case study where process mining is applied t… ▽ More Computer-based scientific experiments are becoming increasingly data-intensive. High-Performance Computing (HPC) clusters are ideal for executing large scientific experiment workflows. Executing large scientific workflows in an HPC cluster leads to complex flows of data and control within the system, which are difficult to analyze. This paper presents a case study where process mining is applied to logs extracted from SLURM-based HPC clusters, in order to document the running workflows and find the performance bottlenecks. The challenge lies in correlating the jobs recorded in the system to enable the application of mainstream process mining techniques. Users may submit jobs with explicit or implicit interdependencies, leading to the consideration of different event correlation techniques. We present a log extraction technique from SLURM clusters, completed with an experimental. △ Less

Submitted 6 July, 2023; originally announced July 2023.

arXiv:2307.02194 [pdf, other]

Abstractions, Scenarios, and Prompt Definitions for Process Mining with LLMs: A Case Study

Authors: Alessandro Berti, Daniel Schuster, Wil M. P. van der Aalst

Abstract: Large Language Models (LLMs) are capable of answering questions in natural language for various purposes. With recent advancements (such as GPT-4), LLMs perform at a level comparable to humans for many proficient tasks. The analysis of business processes could benefit from a natural process querying language and using the domain knowledge on which LLMs have been trained. However, it is impossible… ▽ More Large Language Models (LLMs) are capable of answering questions in natural language for various purposes. With recent advancements (such as GPT-4), LLMs perform at a level comparable to humans for many proficient tasks. The analysis of business processes could benefit from a natural process querying language and using the domain knowledge on which LLMs have been trained. However, it is impossible to provide a complete database or event log as an input prompt due to size constraints. In this paper, we apply LLMs in the context of process mining by i) abstracting the information of standard process mining artifacts and ii) describing the prompting strategies. We implement the proposed abstraction techniques into pm4py, an open-source process mining library. We present a case study using available event logs. Starting from different abstractions and analysis questions, we formulate prompts and evaluate the quality of the answers. △ Less

Submitted 14 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.11453 [pdf, other]

A Collection of Simulated Event Logs for Fairness Assessment in Process Mining

Authors: Timo Pohl, Alessandro Berti, Mahnaz Sadat Qafari, Wil M. P. van der Aalst

Abstract: The analysis of fairness in process mining is a significant aspect of data-driven decision-making, yet the advancement in this field is constrained due to the scarcity of event data that incorporates fairness considerations. To bridge this gap, we present a collection of simulated event logs, spanning four critical domains, which encapsulate a variety of discrimination scenarios. By simulating the… ▽ More The analysis of fairness in process mining is a significant aspect of data-driven decision-making, yet the advancement in this field is constrained due to the scarcity of event data that incorporates fairness considerations. To bridge this gap, we present a collection of simulated event logs, spanning four critical domains, which encapsulate a variety of discrimination scenarios. By simulating these event logs with CPN Tools, we ensure data with known ground truth, thereby offering a robust foundation for fairness analysis. These logs are made freely available under the CC-BY-4.0 license and adhere to the XES standard, thereby assuring broad compatibility with various process mining tools. This initiative aims to empower researchers with the requisite resources to test and develop fairness techniques within process mining, ultimately contributing to the pursuit of equitable, data-driven decision-making processes. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2305.17767 [pdf, other]

Revisiting the Alpha Algorithm To Enable Real-Life Process Discovery Applications -- Extended Report

Authors: Aaron Küsters, Wil M. P. van der Aalst

Abstract: The Alpha algorithm was the first process discovery algorithm that was able to discover process models with concurrency based on incomplete event data while still providing formal guarantees. However, as was stated in the original paper, practical applicability is limited when dealing with exceptional behavior and processes that cannot be described as a structured workflow net without short loops.… ▽ More The Alpha algorithm was the first process discovery algorithm that was able to discover process models with concurrency based on incomplete event data while still providing formal guarantees. However, as was stated in the original paper, practical applicability is limited when dealing with exceptional behavior and processes that cannot be described as a structured workflow net without short loops. This paper presents the Alpha+++ algorithm that overcomes many of these limitations, making the algorithm competitive with more recent process mining approaches. The different steps provide insights into the practical challenges of learning process models with concurrency, choices, sequences, loops, and skipping from event data. The approach was implemented in ProM and tested on various publicly available, real-life event logs. △ Less

Submitted 3 October, 2023; v1 submitted 28 May, 2023; originally announced May 2023.

Comments: 55 pages, 97 figures

arXiv:2305.05113 [pdf, other]

Object-Centric Alignments

Authors: Lukas Liss, Jan Niklas Adams, Wil M. P. van der Aalst

Abstract: Processes tend to interact with other processes and operate on various objects of different types. These objects can influence each other creating dependencies between sub-processes. Analyzing the conformance of such complex processes challenges traditional conformance-checking approaches because they assume a single-case identifier for a process. To create a single-case identifier one has to flat… ▽ More Processes tend to interact with other processes and operate on various objects of different types. These objects can influence each other creating dependencies between sub-processes. Analyzing the conformance of such complex processes challenges traditional conformance-checking approaches because they assume a single-case identifier for a process. To create a single-case identifier one has to flatten complex processes. This leads to information loss when separating the processes that interact on some objects. This paper introduces an alignment approach that operates directly on these object-centric processes. We introduce alignments that can give behavior-based insights into how closely related the event data generated by a process and the behavior specified by an object-centric Petri net are. The contributions of this paper include a definition for object-centric alignments, an algorithm to compute them, a publicly available implementation, and a qualitative and quantitative evaluation. The qualitative evaluation shows that object-centric alignments can give better insights into object-centric processes because they correctly consider inter-object dependencies. Findings from the quantitative evaluation show that the run-time grows exponentially with the number of objects, the length of the process execution, and the cost of the alignment. The evaluation results motivate future research to improve the run-time and make object-centric alignments more applicable for larger processes. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2303.16704 [pdf, other]

TraVaG: Differentially Private Trace Variant Generation Using GANs

Authors: Majid Rafiei, Frederik Wangelik, Mahsa Pourbafrani, Wil M. P. van der Aalst

Abstract: Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g.,… ▽ More Process mining is rapidly growing in the industry. Consequently, privacy concerns regarding sensitive and private information included in event data, used by process mining algorithms, are becoming increasingly relevant. State-of-the-art research mainly focuses on providing privacy guarantees, e.g., differential privacy, for trace variants that are used by the main process mining techniques, e.g., process discovery. However, privacy preservation techniques for releasing trace variants still do not fulfill all the requirements of industry-scale usage. Moreover, providing privacy guarantees when there exists a high rate of infrequent trace variants is still a challenge. In this paper, we introduce TraVaG as a new approach for releasing differentially private trace variants based on \text{Generative Adversarial Networks} (GANs) that provides industry-scale benefits and enhances the level of privacy guarantees when there exists a high ratio of infrequent variants. Moreover, TraVaG overcomes shortcomings of conventional privacy preservation techniques such as bounding the length of variants and introducing fake variants. Experimental results on real-life event data show that our approach outperforms state-of-the-art techniques in terms of privacy guarantees, plain data utility preservation, and result utility preservation. △ Less

Submitted 29 March, 2023; originally announced March 2023.

arXiv:2301.07624 [pdf, other]

Performance-Preserving Event Log Sampling for Predictive Monitoring

Authors: Mohammadreza Fani Sani, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods… ▽ More Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, most of the state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. Moreover, most of these methods require a hyper-parameter optimization that requires several repetitions of the training process which is not feasible in many real-life applications. In this paper, we propose an instance selection procedure that allows sampling training process instances for prediction models. We show that our instance selection procedure allows for a significant increase of training speed for next activity and remaining time prediction methods while maintaining reliable levels of prediction accuracy. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: 25 pages, 1 figure, 13 tables, 47 references. arXiv admin note: substantial text overlap with arXiv:2204.01470

arXiv:2301.02185 [pdf, other]

doi 10.1007/978-3-031-17604-3_12

Discovering Sound Free-choice Workflow Nets With Non-block Structures

Authors: Tsung-Hao Huang, Wil M. P. van der Aalst

Abstract: Process discovery aims to discover models that can explain the behaviors of event logs extracted from information systems. While various approaches have been proposed, only a few guarantee desirable properties such as soundness and free-choice. State-of-the-art approaches that exploit the representational bias of process trees to provide the guarantees are constrained to be block-structured.Such c… ▽ More Process discovery aims to discover models that can explain the behaviors of event logs extracted from information systems. While various approaches have been proposed, only a few guarantee desirable properties such as soundness and free-choice. State-of-the-art approaches that exploit the representational bias of process trees to provide the guarantees are constrained to be block-structured.Such constructs limit the expressive power of the discovered models, i.e., only a subset of sound free-choice workflow nets can be discovered. To support a more flexible structural representation, we aim to discover process models that provide the same guarantees but also allow for non-block structures. Inspired by existing works that utilize synthesis rules from the free-choice nets theory, we propose an automatic approach that incrementally adds activities to an existing process model with predefined patterns. Playing by the rules ensures that the resulting models are always sound and free-choice. Furthermore, the discovered models are not restricted to block structures and are thus more flexible. The approach has been implemented in Python and tested using various real-life event logs. The experiments show that our approach can indeed discover models with competitive quality and more flexible structures compared to the existing approach. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Accepted and published at Enterprise Design, Operations, and Computing: 26th International Conference, EDOC 2022

arXiv:2301.02182 [pdf, other]

Comparing Ordering Strategies For Process Discovery Using Synthesis Rules

Authors: Tsung-Hao Huang, Wil M. P. van der Aalst

Abstract: Process discovery aims to learn process models from observed behaviors, i.e., event logs, in the information systems.The discovered models serve as the starting point for process mining techniques that are used to address performance and compliance problems. Compared to the state-of-the-art Inductive Miner, the algorithm applying synthesis rules from the free-choice net theory discovers process mo… ▽ More Process discovery aims to learn process models from observed behaviors, i.e., event logs, in the information systems.The discovered models serve as the starting point for process mining techniques that are used to address performance and compliance problems. Compared to the state-of-the-art Inductive Miner, the algorithm applying synthesis rules from the free-choice net theory discovers process models with more flexible (non-block) structures while ensuring the same desirable soundness and free-choiceness properties. Moreover, recent development in this line of work shows that the discovered models have compatible quality. Following the synthesis rules, the algorithm incrementally modifies an existing process model by adding the activities in the event log one at a time. As the applications of rules are highly dependent on the existing model structure, the model quality and computation time are significantly influenced by the order of adding activities. In this paper, we investigate the effect of different ordering strategies on the discovered models (w.r.t. fitness and precision) and the computation time using real-life event data. The results show that the proposed ordering strategy can improve the quality of the resulting process models while requiring less time compared to the ordering strategy solely based on the frequency of activities. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: Accepted and to be published in the AIPA2022 workshop https://aip-research-center.github.io/AIPA_workshop/2022/, colocated with ICSOC2022

arXiv:2212.11047 [pdf, other]

Discovering Process Models With Long-Term Dependencies While Providing Guarantees and Filtering Infrequent Behavior Patterns

Authors: Lisa Luise Mannel, Wil M. P. van der Aalst

Abstract: In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are con… ▽ More In process discovery, the goal is to find, for a given event log, the model describing the underlying process. While process models can be represented in a variety of ways, Petri nets form a theoretically well-explored description language and are therefore often used. In this paper, we extend the eST-Miner process discovery algorithm. The eST-Miner computes a set of Petri net places which are considered to be fitting with respect to a certain fraction of the behavior described by the given event log as indicated by a given noise threshold. It evaluates all possible candidate places using token-based replay. The set of replayable traces is determined for each place in isolation, i.e., these sets do not need to be consistent. This allows the algorithm to abstract from infrequent behavioral patterns occurring only in some traces. However, when combining places into a Petri net by connecting them to the corresponding uniquely labeled transitions, the resulting net can replay exactly those traces from the event log that are allowed by the combination of all inserted places. Thus, inserting places one-by-one without considering their combined effect may result in deadlocks and low fitness of the Petri net. In this paper, we explore adaptions of the eST-Miner, that aim to select a subset of places such that the resulting Petri net guarantees a definable minimal fitness while maintaining high precision with respect to the input event log. Furthermore, current place evaluation techniques tend to block the execution of infrequent activity labels. Thus, a refined place fitness metric is introduced and thoroughly investigated. In our experiments we use real and artificial event logs to evaluate and compare the impact of the various place selection strategies and place fitness evaluation metrics on the returned Petri net. △ Less

Submitted 22 January, 2024; v1 submitted 21 December, 2022; originally announced December 2022.

Comments: Fundamenta Informaticae, Petri Nets Special Issue 2022

Journal ref: Fundamenta Informaticae, Volume 190, issues 2-4: Petri Nets 2022 (February 12, 2024) fi:10535

arXiv:2212.00009 [pdf, other]

Resolving Uncertain Case Identifiers in Interaction Logs: A User Study

Authors: Marco Pegoraro, Merih Seran Uysal, Tom-Hendrik Hülsmann, Wil M. P. van der Aalst

Abstract: Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions… ▽ More Modern software systems are able to record vast amounts of user actions, stored for later analysis. One of the main types of such user interaction data is click data: the digital trace of the actions of a user through the graphical elements of an application, website or software. While readily available, click data is often missing a case notion: an attribute linking events from user interactions to a specific process instance in the software. In this paper, we propose a neural network-based technique to determine a case notion for click data, thus enabling process mining and other process analysis techniques on user interaction data. We describe our method, show its scalability to datasets of large dimensions, and we validate its efficacy through a user study based on the segmented event log resulting from interaction data of a mobility sharing company. Interviews with domain experts in the company demonstrate that the case notion obtained by our method can lead to actionable process insights. △ Less

Submitted 21 November, 2022; originally announced December 2022.

Comments: 36 pages, 17 figures, 1 table, 45 references. arXiv admin note: substantial text overlap with arXiv:2204.04164

arXiv:2211.04146 [pdf, other]

doi 10.1007/978-3-031-20984-0_2

Control-Flow-Based Querying of Process Executions from Partially Ordered Event Data

Authors: Daniel Schuster, Michael Martini, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions a… ▽ More Event logs, as viewed in process mining, contain event data describing the execution of operational processes. Most process mining techniques take an event log as input and generate insights about the underlying process by analyzing the data provided. Consequently, handling large volumes of event data is essential to apply process mining successfully. Traditionally, individual process executions are considered sequentially ordered process activities. However, process executions are increasingly viewed as partially ordered activities to more accurately reflect process behavior observed in reality, such as simultaneous execution of activities. Process executions comprising partially ordered activities may contain more complex activity patterns than sequence-based process executions. This paper presents a novel query language to call up process executions from event logs containing partially ordered activities. The query language allows users to specify complex ordering relations over activities, i.e., control flow constraints. Evaluating a query for a given log returns process executions satisfying the specified constraints. We demonstrate the implementation of the query language in a process mining tool and evaluate its performance on real-life event logs. △ Less

Submitted 4 January, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2211.00006 [pdf, other]

High-Level Event Mining: A Framework

Authors: Bianka Bakullari, Wil M. P. van der Aalst

Abstract: Process mining methods often analyze processes in terms of the individual end-to-end process runs. Process behavior, however, may materialize as a general state of many involved process components, which can not be captured by looking at the individual process instances. A more holistic state of the process can be determined by looking at the events that occur close in time and share common proces… ▽ More Process mining methods often analyze processes in terms of the individual end-to-end process runs. Process behavior, however, may materialize as a general state of many involved process components, which can not be captured by looking at the individual process instances. A more holistic state of the process can be determined by looking at the events that occur close in time and share common process capacities. In this work, we conceptualize such behavior using high-level events and propose a new framework for detecting and logging such high-level events. The output of our method is a new high-level event log, which collects all generated high-level events together with the newly assigned event attributes: activity, case, and timestamp. Existing process mining techniques can then be applied on the produced high-level event log to obtain further insights. Experiments on both simulated and real-life event data show that our method is able to automatically discover how system-level patterns such as high traffic and workload emerge, propagate and dissolve throughout the process. △ Less

Submitted 31 October, 2022; originally announced November 2022.

arXiv:2210.16786 [pdf, other]

Explainable Predictive Decision Mining for Operational Support

Authors: Gyunam Park, Aaron Küsters, Mara Tews, Cameron Pitsch, Jonathan Schneider, Wil M. P. van der Aalst

Abstract: Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than $500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of… ▽ More Several decision points exist in business processes (e.g., whether a purchase order needs a manager's approval or not), and different decisions are made for different process instances based on their characteristics (e.g., a purchase order higher than $500 needs a manager approval). Decision mining in process mining aims to describe/predict the routing of a process instance at a decision point of the process. By predicting the decision, one can take proactive actions to improve the process. For instance, when a bottleneck is developing in one of the possible decisions, one can predict the decision and bypass the bottleneck. However, despite its huge potential for such operational support, existing techniques for decision mining have focused largely on describing decisions but not on predicting them, deploying decision trees to produce logical expressions to explain the decision. In this work, we aim to enhance the predictive capability of decision mining to enable proactive operational support by deploying more advanced machine learning algorithms. Our proposed approach provides explanations of the predicted decisions using SHAP values to support the elicitation of proactive actions. We have implemented a Web application to support the proposed approach and evaluated the approach using the implementation. △ Less

Submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.14951 [pdf, other]

TraVaS: Differentially Private Trace Variant Selection for Process Mining

Authors: Majid Rafiei, Frederik Wangelik, Wil M. P. van der Aalst

Abstract: In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical li… ▽ More In the area of industrial process mining, privacy-preserving event data publication is becoming increasingly relevant. Consequently, the trade-off between high data utility and quantifiable privacy poses new challenges. State-of-the-art research mainly focuses on differentially private trace variant construction based on prefix expansion methods. However, these algorithms face several practical limitations such as high computational complexity, introducing fake variants, removing frequent variants, and a bounded variant length. In this paper, we introduce a new approach for direct differentially private trace variant release which uses anonymized \textit{partition selection} strategies to overcome the aforementioned restraints. Experimental results on real-life event data show that our algorithm outperforms state-of-the-art methods in terms of both plain data utility and result utility preservation. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.12080 [pdf, other]

Monitoring Constraints in Business Processes Using Object-Centric Constraint Graphs

Authors: Gyunam Park, Wil. M. P. van der Aalst

Abstract: Constraint monitoring aims to monitor the violation of constraints in business processes, e.g., an invoice should be cleared within 48 hours after the corresponding goods receipt, by analyzing event data. Existing techniques for constraint monitoring assume that a single case notion exists in a business process, e.g., a patient in a healthcare process, and each event is associated with the case no… ▽ More Constraint monitoring aims to monitor the violation of constraints in business processes, e.g., an invoice should be cleared within 48 hours after the corresponding goods receipt, by analyzing event data. Existing techniques for constraint monitoring assume that a single case notion exists in a business process, e.g., a patient in a healthcare process, and each event is associated with the case notion. However, in reality, business processes are object-centric, i.e., multiple case notions (objects) exist, and an event may be associated with multiple objects. For instance, an Order-To-Cash (O2C) process involves order, item, delivery, etc., and they interact when executing an event, e.g., packing multiple items together for a delivery. The existing techniques produce misleading insights when applied to such object-centric business processes. In this work, we propose an approach to monitoring constraints in object-centric business processes. To this end, we introduce Object-Centric Constraint Graphs (OCCGs) to represent constraints that consider the interaction of objects. Next, we evaluate the constraints represented by OCCGs by analyzing Object-Centric Event Logs (OCELs) that store the interaction of different objects in events. We have implemented a web application to support the proposed approach and conducted two case studies using a real-life SAP ERP system. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2209.10897 [pdf, other]

Process Modeling and Conformance Checking in Healthcare: A COVID-19 Case Study

Authors: Elisabetta Benevento, Marco Pegoraro, Mattia Antoniazzi, Harry H. Beyel, Viki Peeva, Paul Balfanz, Wil M. P. van der Aalst, Lukas Martin, Gernot Marx

Abstract: The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: developing a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and a… ▽ More The discipline of process mining has a solid track record of successful applications to the healthcare domain. Within such research space, we conducted a case study related to the Intensive Care Unit (ICU) ward of the Uniklinik Aachen hospital in Germany. The aim of this work is twofold: developing a normative model representing the clinical guidelines for the treatment of COVID-19 patients, and analyzing the adherence of the observed behavior (recorded in the information system of the hospital) to such guidelines. We show that, through conformance checking techniques, it is possible to analyze the care process for COVID-19 patients, highlighting the main deviations from the clinical guidelines. The results provide physicians with useful indications for improving the process and ensuring service quality and patient satisfaction. We share the resulting model as an open-source BPMN file. △ Less

Submitted 23 November, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 12 pages, 2 figures, 3 tables, 15 references

arXiv:2209.04290 [pdf, other]

doi 10.1007/978-3-031-17834-4_18

Conformance Checking for Trace Fragments Using Infix and Postfix Alignments

Authors: Daniel Schuster, Niklas Föcking, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefix… ▽ More Conformance checking deals with collating modeled process behavior with observed process behavior recorded in event data. Alignments are a state-of-the-art technique to detect, localize, and quantify deviations in process executions, i.e., traces, compared to reference process models. Alignments, however, assume complete process executions covering the entire process from start to finish or prefixes of process executions. This paper defines infix/postfix alignments, proposes approaches to their computation, and evaluates them using real-life event data. △ Less

Submitted 15 August, 2022; originally announced September 2022.

arXiv:2209.01219 [pdf, other]

A Framework for Extracting and Encoding Features from Object-Centric Event Data

Authors: Jan Niklas Adams, Gyunam Park, Sergej Levich, Daniel Schuster, Wil M. P. van der Aalst

Abstract: Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process. Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. As traditional process mining techniques assume events associated with exactly one object, these techniq… ▽ More Traditional process mining techniques take event data as input where each event is associated with exactly one object. An object represents the instantiation of a process. Object-centric event data contain events associated with multiple objects expressing the interaction of multiple processes. As traditional process mining techniques assume events associated with exactly one object, these techniques cannot be applied to object-centric event data. To use traditional process mining techniques, the object-centric event data are flattened by removing all object references but one. The flattening process is lossy, leading to inaccurate features extracted from flattened data. Furthermore, the graph-like structure of object-centric event data is lost when flattening. In this paper, we introduce a general framework for extracting and encoding features from object-centric event data. We calculate features natively on the object-centric event data, leading to accurate measures. Furthermore, we provide three encodings for these features: tabular, sequential, and graph-based. While tabular and sequential encodings have been heavily used in process mining, the graph-based encoding is a new technique preserving the structure of the object-centric event data. We provide six use cases: a visualization and a prediction use case for each of the three encodings. We use explainable AI in the prediction use cases to show the utility of both the object-centric features and the structure of the sequential and graph-based encoding for a predictive model. △ Less

Submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.13515 [pdf, other]

Detecting Surprising Situations in Event Data

Authors: Christian Kohlschmidt, Mahnaz Sadat Qafari, Wil M. P. van der Aalst

Abstract: Process mining is a set of techniques that are used by organizations to understand and improve their operational processes. The first essential step in designing any process reengineering procedure is to find process improvement opportunities. In existing work, it is usually assumed that the set of problematic process instances in which an undesirable outcome occurs is known prior or is easily det… ▽ More Process mining is a set of techniques that are used by organizations to understand and improve their operational processes. The first essential step in designing any process reengineering procedure is to find process improvement opportunities. In existing work, it is usually assumed that the set of problematic process instances in which an undesirable outcome occurs is known prior or is easily detectable. So the process enhancement procedure involves finding the root causes and the treatments for the problem in those process instances. For example, the set of problematic instances is considered as those with outlier values or with values smaller/bigger than a given threshold in one of the process features. However, on various occasions, using this approach, many process enhancement opportunities, not captured by these problematic process instances, are missed. To overcome this issue, we formulate finding the process enhancement areas as a context-sensitive anomaly/outlier detection problem. We define a process enhancement area as a set of situations (process instances or prefixes of process instances) where the process performance is surprising. We aim to characterize those situations where process performance/outcome is significantly different from what was expected considering its performance/outcome in similar situations. To evaluate the validity and relevance of the proposed approach, we have implemented and evaluated it on several real-life event logs. △ Less

Submitted 29 August, 2022; originally announced August 2022.

Comments: 12 pages, 10 figures

arXiv:2208.03235 [pdf, other]

Defining Cases and Variants for Object-Centric Event Data

Authors: Jan Niklas Adams, Daniel Schuster, Seth Schmitz, Günther Schuh, Wil M. P. van der Aalst

Abstract: The execution of processes leaves traces of event data in information systems. These event data can be analyzed through process mining techniques. For traditional process mining techniques, one has to associate each event with exactly one object, e.g., the company's customer. Events related to one object form an event sequence called a case. A case describes an end-to-end run through a process. Th… ▽ More The execution of processes leaves traces of event data in information systems. These event data can be analyzed through process mining techniques. For traditional process mining techniques, one has to associate each event with exactly one object, e.g., the company's customer. Events related to one object form an event sequence called a case. A case describes an end-to-end run through a process. The cases contained in event data can be used to discover a process model, detect frequent bottlenecks, or learn predictive models. However, events encountered in real-life information systems, e.g., ERP systems, can often be associated with multiple objects. The traditional sequential case concept falls short of these object-centric event data as these data exhibit a graph structure. One might force object-centric event data into the traditional case concept by flattening it. However, flattening manipulates the data and removes information. Therefore, a concept analogous to the case concept of traditional event logs is necessary to enable the application of different process mining tasks on object-centric event data. In this paper, we introduce the case concept for object-centric process mining: process executions. These are graph-based generalizations of cases as considered in traditional process mining. Furthermore, we provide techniques to extract process executions. Based on these executions, we determine equivalent process behavior with respect to an attribute using graph isomorphism. Equivalent process executions with respect to the event's activity are object-centric variants, i.e., a generalization of variants in traditional process mining. We provide a visualization technique for object-centric variants. The contribution's scalability and efficiency are extensively evaluated. Furthermore, we provide a case study showing the most frequent object-centric variants of a real-life event log. △ Less

Submitted 5 August, 2022; originally announced August 2022.

arXiv:2208.01886 [pdf, other]

Quantifying Temporal Privacy Leakage in Continuous Event Data Publishing

Authors: Majid Rafiei, Gamal Elkoumy, Wil M. P. van der Aalst

Abstract: Process mining employs event data extracted from different types of information systems to discover and analyze actual processes. Event data often contain highly sensitive information about the people who carry out activities or the people for whom activities are performed. Therefore, privacy concerns in process mining are receiving increasing attention. To alleviate privacy-related risks, several… ▽ More Process mining employs event data extracted from different types of information systems to discover and analyze actual processes. Event data often contain highly sensitive information about the people who carry out activities or the people for whom activities are performed. Therefore, privacy concerns in process mining are receiving increasing attention. To alleviate privacy-related risks, several privacy preservation techniques have been proposed. Differential privacy is one of these techniques which provides strong privacy guarantees. However, the proposed techniques presume that event data are released in only one shot, whereas business processes are continuously executed. Hence, event data are published repeatedly, resulting in additional risks. In this paper, we demonstrate that continuously released event data are not independent, and the correlation among different releases can result in privacy degradation when the same differential privacy mechanism is applied to each release. We quantify such privacy degradation in the form of temporal privacy leakages. We apply continuous event data publishing scenarios to real-life event logs to demonstrate privacy leakages. △ Less

Submitted 29 September, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

arXiv:2207.12764 [pdf, other]

Clustering Object-Centric Event Logs

Authors: Anahita Farhang Ghahfarokhi, Fatemeh Akoochekian, Fareed Zandkarimi, Wil M. P. van der Aalst

Abstract: Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event lo… ▽ More Process mining provides various algorithms to analyze process executions based on event data. Process discovery, the most prominent category of process mining techniques, aims to discover process models from event logs, however, it leads to spaghetti models when working with real-life data. Therefore, several clustering techniques have been proposed on top of traditional event logs (i.e., event logs with a single case notion) to reduce the complexity of process models and discover homogeneous subsets of cases. Nevertheless, in real-life processes, particularly in the context of Business-to-Business (B2B) processes, multiple objects are involved in a process. Recently, Object-Centric Event Logs (OCELs) have been introduced to capture the information of such processes, and several process discovery techniques have been developed on top of OCELs. Yet, the output of the proposed discovery techniques on real OCELs leads to more informative but also more complex models. In this paper, we propose a clustering-based approach to cluster similar objects in OCELs to simplify the obtained process models. Using a case study of a real B2B process, we demonstrate that our approach reduces the complexity of the process models and generates coherent subsets of objects which help the end-users gain insights into the process. △ Less

Submitted 26 July, 2022; originally announced July 2022.

arXiv:2207.10017 [pdf, other]

Predictive Object-Centric Process Monitoring

Authors: Timo Rohrer, Anahita Farhang Ghahfarokhi, Mohamed Behery, Gerhard Lakemeyer, Wil M. P. van der Aalst

Abstract: The automation and digitalization of business processes has resulted in large amounts of data captured in information systems, which can aid businesses in understanding their processes better, improve workflows, or provide operational support. By making predictions about ongoing processes, bottlenecks can be identified and resources reallocated, as well as insights gained into the state of a proce… ▽ More The automation and digitalization of business processes has resulted in large amounts of data captured in information systems, which can aid businesses in understanding their processes better, improve workflows, or provide operational support. By making predictions about ongoing processes, bottlenecks can be identified and resources reallocated, as well as insights gained into the state of a process instance (case). Traditionally, data is extracted from systems in the form of an event log with a single identifying case notion, such as an order id for an Order to Cash (O2C) process. However, real processes often have multiple object types, for example, order, item, and package, so a format that forces the use of a single case notion does not reflect the underlying relations in the data. The Object-Centric Event Log (OCEL) format was introduced to correctly capture this information. The state-of-the-art predictive methods have been tailored to only traditional event logs. This thesis shows that a prediction method utilizing Generative Adversarial Networks (GAN), Long Short-Term Memory (LSTM) architectures, and Sequence to Sequence models (Seq2seq), can be augmented with the rich data contained in OCEL. Objects in OCEL can have attributes that are useful in predicting the next event and timestamp, such as a priority class attribute for an object type package indicating slower or faster processing. In the metrics of sequence similarity of predicted remaining events and mean absolute error (MAE) of the timestamp, the approach in this thesis matches or exceeds previous research, depending on whether selected object attributes are useful features for the model. Additionally, this thesis provides a web interface to predict the next sequence of activities from user input. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2206.05532 [pdf, other]

doi 10.1007/978-3-031-16171-1_12

Detecting Context-Aware Deviations in Process Executions

Authors: Gyunam Park, Janik-Vasily Benzin, Wil M. P. van der Aalst

Abstract: A deviation detection aims to detect deviating process instances, e.g., patients in the healthcare process and products in the manufacturing process. A business process of an organization is executed in various contextual situations, e.g., a COVID-19 pandemic in the case of hospitals and a lack of semiconductor chip shortage in the case of automobile companies. Thus, context-aware deviation detect… ▽ More A deviation detection aims to detect deviating process instances, e.g., patients in the healthcare process and products in the manufacturing process. A business process of an organization is executed in various contextual situations, e.g., a COVID-19 pandemic in the case of hospitals and a lack of semiconductor chip shortage in the case of automobile companies. Thus, context-aware deviation detection is essential to provide relevant insights. However, existing work 1) does not provide a systematic way of incorporating various contexts, 2) is tailored to a specific approach without using an extensive pool of existing deviation detection techniques, and 3) does not distinguish positive and negative contexts that justify and refute deviation, respectively. In this work, we provide a framework to bridge the aforementioned gaps. We have implemented the proposed framework as a web service that can be extended to various contexts and deviation detection methods. We have evaluated the effectiveness of the proposed framework by conducting experiments using 255 different contextual scenarios. △ Less

Submitted 11 June, 2022; originally announced June 2022.

Journal ref: LNBIP 458 (2022) 190-206

arXiv:2204.10662 [pdf, other]

doi 10.1007/978-3-031-17995-2_20

OPerA: Object-Centric Performance Analysis

Authors: Gyunam Park, Jan Niklas Adams, Wil. M. P. van der Aalst

Abstract: Performance analysis in process mining aims to provide insights on the performance of a business process by using a process model as a formal representation of the process. Such insights are reliably interpreted by process analysts in the context of a model with formal semantics. Existing techniques for performance analysis assume that a single case notion exists in a business process (e.g., a pat… ▽ More Performance analysis in process mining aims to provide insights on the performance of a business process by using a process model as a formal representation of the process. Such insights are reliably interpreted by process analysts in the context of a model with formal semantics. Existing techniques for performance analysis assume that a single case notion exists in a business process (e.g., a patient in healthcare process). However, in reality, different objects might interact (e.g., order, item, delivery, and invoice in an O2C process). In such a setting, traditional techniques may yield misleading or even incorrect insights on performance metrics such as waiting time. More importantly, by considering the interaction between objects, we can define object-centric performance metrics such as synchronization time, pooling time, and lagging time. In this work, we propose a novel approach to performance analysis considering multiple case notions by using object-centric Petri nets as formal representations of business processes. The proposed approach correctly computes existing performance metrics, while supporting the derivation of newly-introduced object-centric performance metrics. We have implemented the approach as a web application and conducted a case study based on a real-life loan application process. △ Less

Submitted 27 June, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Journal ref: LNCS 13607 (2022) 281-292

arXiv:2204.04898 [pdf, ps, other]

PM4Py-GPU: a High-Performance General-Purpose Library for Process Mining

Authors: Alessandro Berti, Minh Phan Nghia, Wil M. P. van der Aalst

Abstract: Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to analyze large amounts of data. This paper presents PM4Py-GPU, a Python process mining library based on the NVIDIA RAPIDS framework. Thanks to the dataframe colum… ▽ More Open-source process mining provides many algorithms for the analysis of event data which could be used to analyze mainstream processes (e.g., O2C, P2P, CRM). However, compared to commercial tools, they lack the performance and struggle to analyze large amounts of data. This paper presents PM4Py-GPU, a Python process mining library based on the NVIDIA RAPIDS framework. Thanks to the dataframe columnar storage and the high level of parallelism, a significant speed-up is achieved on classic process mining computations and processing activities. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2204.04164 [pdf, other]

Uncertain Case Identifiers in Process Mining: A User Study of the Event-Case Correlation Problem on Click Data

Authors: Marco Pegoraro, Merih Seran Uysal, Tom-Hendrik Hülsmann, Wil M. P. van der Aalst

Abstract: Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for e… ▽ More Among the many sources of event data available today, a prominent one is user interaction data. User activity may be recorded during the use of an application or website, resulting in a type of user interaction data often called click data. An obstacle to the analysis of click data using process mining is the lack of a case identifier in the data. In this paper, we show a case and user study for event-case correlation on click data, in the context of user interaction events from a mobility sharing company. To reconstruct the case notion of the process, we apply a novel method to aggregate user interaction data in separate user sessions-interpreted as cases-based on neural networks. To validate our findings, we qualitatively discuss the impact of process mining analyses on the resulting well-formed event log through interviews with process experts. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 15 pages, 10 figures, 1 table, 18 references

arXiv:2204.04135 [pdf, other]

An XES Extension for Uncertain Event Data

Authors: Marco Pegoraro, Merih Seran Uysal, Wil M. P. van der Aalst

Abstract: Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty-imprecisions quantified with meta-information… ▽ More Event data, often stored in the form of event logs, serve as the starting point for process mining and other evidence-based process improvements. However, event data in logs are often tainted by noise, errors, and missing data. Recently, a novel body of research has emerged, with the aim to address and analyze a class of anomalies known as uncertainty-imprecisions quantified with meta-information in the event log. This paper illustrates an extension of the XES data standard capable of representing uncertain event data. Such an extension enables input, output, and manipulation of uncertain data, as well as analysis through the process discovery and conformance checking approaches available in literature. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 9 pages, 1 figure, 3 tables, 11 references

Journal ref: CEUR Workshop Proceedings 2973 (2021) 116-120

arXiv:2204.01470 [pdf, other]

doi 10.1007/978-3-030-98581-3_12

Event Log Sampling for Predictive Monitoring

Authors: Mohammadreza Fani Sani, Mozhgan Vazifehdoostirani, Gyunam Park, Marco Pegoraro, Sebastiaan J. van Zelst, Wil M. P. van der Aalst

Abstract: Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection p… ▽ More Predictive process monitoring is a subfield of process mining that aims to estimate case or event features for running process instances. Such predictions are of significant interest to the process stakeholders. However, state-of-the-art methods for predictive monitoring require the training of complex machine learning models, which is often inefficient. This paper proposes an instance selection procedure that allows sampling training process instances for prediction models. We show that our sampling method allows for a significant increase of training speed for next activity prediction methods while maintaining reliable levels of prediction accuracy. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Comments: 7 pages, 1 figure, 4 tables, 34 references

Journal ref: ICPM Workshops (2021) 154-166

arXiv:2204.00547 [pdf, other]

A Web-Based Tool for Comparative Process Mining

Authors: Madhavi Bangalore Shankara Narayana, Elisabetta Benevento, Marco Pegoraro, Muhammad Abdullah, Rahim Bin Shahid, Qasim Sajid, Muhammad Usman Mansoor, Wil M. P. van der Aalst

Abstract: Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We… ▽ More Process mining techniques enable the analysis of a wide variety of processes using event data. Among the available process mining techniques, most consider a single process perspective at a time-in the shape of a model or log. In this paper, we have developed a tool that can compare and visualize the same process under different constraints, allowing to analyze multiple aspects of the process. We describe the architecture, structure and use of the tool, and we provide an open-source full implementation. △ Less

Submitted 4 April, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: 2 pages, 2 figures, 6 references

arXiv:2203.12969 [pdf, other]

doi 10.1007/978-3-031-05760-1_10

Analyzing Process-Aware Information System Updates Using Digital Twins of Organizations

Authors: Gyunam Park, Marco Comuzzi, Wil M. P. van der Aalst

Abstract: Digital transformation often entails small-scale changes to information systems supporting the execution of business processes. These changes may increase the operational frictions in process execution, which decreases the process performance. The contributions in the literature providing support to the tracking and impact analysis of small-scale changes are limited in scope and functionality. In… ▽ More Digital transformation often entails small-scale changes to information systems supporting the execution of business processes. These changes may increase the operational frictions in process execution, which decreases the process performance. The contributions in the literature providing support to the tracking and impact analysis of small-scale changes are limited in scope and functionality. In this paper, we use the recently developed Digital Twins of Organizations (DTOs) to assess the impact of (process-aware) information systems updates. More in detail, we model the updates using the configuration of DTOs and quantitatively assess different types of impacts of information system updates (structural, operational, and performance-related). We implemented a prototype of the proposed approach. Moreover, we discuss a case study involving a standard ERP procure-to-pay business process. △ Less

Submitted 24 March, 2022; originally announced March 2022.

Journal ref: LNBIP 446 (2022) 159-176

arXiv:2203.09286 [pdf, other]

How to Write Beautiful Process-and-Data-Science Papers?

Authors: Wil M. P. van der Aalst

Abstract: After 25 years of PhD supervision, the author noted typical recurring problems that make papers look sloppy, difficult to read, and incoherent. The goal is not to write a paper for the sake of writing a paper, but to convey a valuable message that is clear and precise. The goal is to write papers that have an impact and are still understandable a couple of decades later. Our mission should be to c… ▽ More After 25 years of PhD supervision, the author noted typical recurring problems that make papers look sloppy, difficult to read, and incoherent. The goal is not to write a paper for the sake of writing a paper, but to convey a valuable message that is clear and precise. The goal is to write papers that have an impact and are still understandable a couple of decades later. Our mission should be to create papers of high quality that people want to read and that can stand the test of time. We use Dijkstra's adagium "Beauty Is Our Business" to stress the importance of simplicity, correctness, and cleanness. △ Less

Submitted 4 July, 2024; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: 19 pages. 1 figure

ACM Class: A.0; K.3.0

arXiv:2202.05709 [pdf, other]

A Python Tool for Object-Centric Process Mining Comparison

Authors: Anahita Farhang Ghahfarokhi, Wil M. P. van der Aalst

Abstract: Object-centric process mining provides a more holistic view of processes where we analyze processes with multiple case notions. However, most object-centric process mining techniques consider the whole event log rather than the comparison of existing behaviors in the log. In this paper, we introduce a stand-alone object-centric process cube tool built on the PM4PY-MDL process mining framework. Our… ▽ More Object-centric process mining provides a more holistic view of processes where we analyze processes with multiple case notions. However, most object-centric process mining techniques consider the whole event log rather than the comparison of existing behaviors in the log. In this paper, we introduce a stand-alone object-centric process cube tool built on the PM4PY-MDL process mining framework. Our infrastructure uses both object and event attributes to build the process cube which leads to different types of materialization. Furthermore, our tool is equipped with the state of the art object-centric process mining techniques. Through our tool the user can visualize the extracted object-centric event log from process cube operations, export the object-centric event log, discover the state-of-the-art object-centric process model for the extracted log, and compare the process models side-by-side. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.05639 [pdf, other]

A Scalable Database for the Storage of Object-Centric Event Logs

Authors: Alessandro Berti, Anahita Farhang Ghahfarokhi, Gyunam Park, Wil M. P. van der Aalst

Abstract: Object-centric process mining provides a set of techniques for the analysis of event data where events are associated to several objects. To store Object-centric Event Logs (OCELs), the JSON-OCEL and JSON-XML formats have been recently proposed. However, the proposed implementations of the OCEL are file-based. This means that the entire file needs to be parsed in order to apply process mining tech… ▽ More Object-centric process mining provides a set of techniques for the analysis of event data where events are associated to several objects. To store Object-centric Event Logs (OCELs), the JSON-OCEL and JSON-XML formats have been recently proposed. However, the proposed implementations of the OCEL are file-based. This means that the entire file needs to be parsed in order to apply process mining techniques, such as the discovery of object-centric process models. In this paper, we propose a database storage for the OCEL format using the MongoDB document database. Since documents in MongoDB are equivalent to JSON objects, the current JSON implementation of the standard could be translated straightforwardly in a series of MongoDB collections. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.04625 [pdf, other]

Analyzing Medical Data with Process Mining: a COVID-19 Case Study

Authors: Marco Pegoraro, Madhavi Bangalore Shankara Narayana, Elisabetta Benevento, Wil M. P. van der Aalst, Lukas Martin, Gernot Marx

Abstract: The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of even… ▽ More The recent increase in the availability of medical data, possible through automation and digitization of medical equipment, has enabled more accurate and complete analysis on patients' medical data through many branches of data science. In particular, medical records that include timestamps showing the history of a patient have enabled the representation of medical information as sequences of events, effectively allowing to perform process mining analyses. In this paper, we will present some preliminary findings obtained with established process mining techniques in regard of the medical data of patients of the Uniklinik Aachen hospital affected by the recent epidemic of COVID-19. We show that process mining techniques are able to reconstruct a model of the ICU treatments for COVID patients. △ Less

Submitted 25 March, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: 9 pages, 5 figures, 11 references

arXiv:2201.07755 [pdf, other]

Interactive Process Improvement using Simulation of Enriched Process Trees

Authors: Mahsa Pourbafrani, Wil M. P. van der Aalst

Abstract: Event data provide the main source of information for analyzing and improving processes in organizations. Process mining techniques capture the state of running processes w.r.t. various aspects, such as activity-flow and performance metrics. The next step for process owners is to take the provided insights and turn them into actions in order to improve their processes. These actions may be taken i… ▽ More Event data provide the main source of information for analyzing and improving processes in organizations. Process mining techniques capture the state of running processes w.r.t. various aspects, such as activity-flow and performance metrics. The next step for process owners is to take the provided insights and turn them into actions in order to improve their processes. These actions may be taken in different aspects of a process. However, simply being aware of the process aspects that need to be improved as well as potential actions is insufficient. The key step in between is to assess the outcomes of the decisions and improvements. In this paper, we propose a framework to systematically compare event data and the simulated event data of organizations, as well as comparing the results of modified processes in different settings. The proposed framework could be provided as an analytic service to enable organizations in easily accessing event data analytics. The framework is supported with a simulation tool that enables applying changes to the processes and re-running the process in various scenarios. The simulation step includes different perspectives of a process that can be captured automatically and modified by the user. Then, we apply a state-of-the-art comparison approach for processes using their event data which visually reflects the effects of these changes in the process, i.e., evaluating the process improvement. Our framework also includes the implementation of the change measurement module as a tool. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Showing 1–50 of 104 results for author: van der Aalst, W M P