Process mining

Description

The activity performs process sequence extraction from raw event data to support use cases like flow visualization, performance analysis, and bottleneck detection.
It operates on a list of events, where each event belongs to a group and has a name, a start time, and an end time. These four columns—group, event, start time, and end time—are configurable through the provided configuration, allowing flexibility to adapt to different datasets.

The core logic begins by grouping all events based on their group identifier. Each group represents a single execution or instance of a process. Within each group:
Events are sorted by their start and end times.
Each event is assigned an index representing its position in the timeline.
The activity then establishes connections (or “edges”) between events based on their timing—typically from an earlier event to a later one.

Initial sequences are drawn from the first events in a group (those that start earliest), followed by connecting each event to its most recent predecessor that ended before it began. This captures a linear or sequential flow of execution.

In parallel processes where multiple events may start around the same time, the logic also identifies and links these overlapping activities. Additionally, if certain events aren’t part of any sequence yet, the activity attempts to connect them to the next plausible event.

Finally, events that end last in their group are considered terminal points and linked accordingly to indicate the process end.

Each detected transition is stored with details such as the group, start and end timestamps, a descriptive sequence like “A->B”, and a count of how many times that transition occurred. This structured output helps in mapping the real execution path of processes for further analysis.

Use case: In a customer support process, each support ticket logs events like Received, Assigned, In Progress, and Resolved. By applying the Process Mining activity, teams can extract the actual flow of events per ticket and identify delays or skipped steps, helping improve response time and service quality. The extracted sequences can then be visualized or analyzed further using dashboards

Input

Input Type	Required
Data	Required

Output

Output Type	Description
Data	Transformed data with sequences

Configuration Fields

Field	Description
Event group	Column representing the group identifier (e.g., case ID).
Event	Column that uniquely identifies an event.
Start time	Column indicating when an event starts.
End time	Column indicating when an event ends.
Column name for sequence Id	Name of the column to store the sequence string (e.g., `A->B`).
Sequence Id seperator	Character or string used to join event IDs in a sequence (e.g., `->`).

Sample Input

eventId	caseId	startTime	endTime
2673023124	166153460	2025-01-02T03:29:07.000Z	2025-01-02T03:29:21.000Z
3470567566	166153460	2025-01-02T03:29:22.000Z	2025-01-02T03:29:32.000Z
1553158285	166153460	2025-01-02T03:29:56.000Z	2025-01-02T03:29:56.000Z
1553158285	166153460	2025-01-02T03:30:08.000Z	2025-01-02T03:30:08.000Z
86709890	166153460	2025-01-02T03:30:10.000Z	2025-01-02T03:30:12.000Z

Sample Configuration

Field	Value
Event group	`caseId`
Event	`eventId`
Start time	`startTime`
End time	`endTime`
Column name for sequence Id	`SequenceId`
Sequence Id Seperator	`->`

Sample Output

Case Id	Sequence Id	Start time	End time
166153460	->2673023124		2025-01-02T03:29:07.000Z
166153460	2673023124->3470567566	2025-01-02T03:29:21.000Z	2025-01-02T03:29:22.000Z
166153460	3470567566->1553158285	2025-01-02T03:29:32.000Z	2025-01-02T03:29:56.000Z
166153460	1553158285->1553158285	2025-01-02T03:29:56.000Z	2025-01-02T03:30:08.000Z
166153460	1553158285->86709890	2025-01-02T03:30:08.000Z	2025-01-02T03:30:10.000Z