Process mining
Description
The activity performs process sequence extraction from raw event data to support use cases like flow visualization, performance analysis, and bottleneck detection.
It operates on a list of events, where each event belongs to a group and has a name, a start time, and an end time. These four columns—group, event, start time, and end time—are configurable through the provided configuration, allowing flexibility to adapt to different datasets.
The core logic begins by grouping all events based on their group identifier. Each group represents a single execution or instance of a process. Within each group:
Events are sorted by their start and end times.
Each event is assigned an index representing its position in the timeline.
The activity then establishes connections (or “edges”) between events based on their timing—typically from an earlier event to a later one.
Initial sequences are drawn from the first events in a group (those that start earliest), followed by connecting each event to its most recent predecessor that ended before it began. This captures a linear or sequential flow of execution.
In parallel processes where multiple events may start around the same time, the logic also identifies and links these overlapping activities. Additionally, if certain events aren’t part of any sequence yet, the activity attempts to connect them to the next plausible event.
Finally, events that end last in their group are considered terminal points and linked accordingly to indicate the process end.
Each detected transition is stored with details such as the group, start and end timestamps, a descriptive sequence like “A->B”, and a count of how many times that transition occurred. This structured output helps in mapping the real execution path of processes for further analysis.
Use case: In a customer support process, each support ticket logs events like Received, Assigned, In Progress, and Resolved. By applying the Process Mining activity, teams can extract the actual flow of events per ticket and identify delays or skipped steps, helping improve response time and service quality. The extracted sequences can then be visualized or analyzed further using dashboards
Input
Input Type | Required |
---|---|
Data | Required |
Output
Output Type | Description |
---|---|
Data | Transformed data with sequences |
Configuration Fields
Field | Description |
---|---|
Event group | Column representing the group identifier (e.g., case ID). |
Event | Column that uniquely identifies an event. |
Start time | Column indicating when an event starts. |
End time | Column indicating when an event ends. |
Column name for sequence Id | Name of the column to store the sequence string (e.g., A->B ). |
Sequence Id seperator | Character or string used to join event IDs in a sequence (e.g., -> ). |
Sample Input
eventId | caseId | startTime | endTime |
---|---|---|---|
2673023124 | 166153460 | 2025-01-02T03:29:07.000Z | 2025-01-02T03:29:21.000Z |
3470567566 | 166153460 | 2025-01-02T03:29:22.000Z | 2025-01-02T03:29:32.000Z |
1553158285 | 166153460 | 2025-01-02T03:29:56.000Z | 2025-01-02T03:29:56.000Z |
1553158285 | 166153460 | 2025-01-02T03:30:08.000Z | 2025-01-02T03:30:08.000Z |
86709890 | 166153460 | 2025-01-02T03:30:10.000Z | 2025-01-02T03:30:12.000Z |
Sample Configuration
Field | Value |
---|---|
Event group | caseId |
Event | eventId |
Start time | startTime |
End time | endTime |
Column name for sequence Id | SequenceId |
Sequence Id Seperator | -> |
Sample Output
Case Id | Sequence Id | Start time | End time |
---|---|---|---|
166153460 | ->2673023124 | 2025-01-02T03:29:07.000Z | |
166153460 | 2673023124->3470567566 | 2025-01-02T03:29:21.000Z | 2025-01-02T03:29:22.000Z |
166153460 | 3470567566->1553158285 | 2025-01-02T03:29:32.000Z | 2025-01-02T03:29:56.000Z |
166153460 | 1553158285->1553158285 | 2025-01-02T03:29:56.000Z | 2025-01-02T03:30:08.000Z |
166153460 | 1553158285->86709890 | 2025-01-02T03:30:08.000Z | 2025-01-02T03:30:10.000Z |