Description
The Group Longtail Values activity helps streamline datasets by consolidating lesser-used, low-frequency, or non-priority values in a column into a single replacement value.
It is often used to reduce category fragmentation and simplify downstream analysis by focusing on the most relevant or allowed values and grouping the remaining entries under a common label (e.g., Others
).
Use this activity to:
- Clean and normalize long-tail categorical data
- Replace values not in an allow-list with a defined label
- Focus analysis on key brands, categories, or terms
- Minimize noise from low-frequency entries in visualization or reporting
Use case:
A dataset contains numerous product brands, many of which appear only once or twice. To improve chart readability, you can group all brands not in the top 3 (Apple
, Samsung
, Google
) as Others
, using this activity before visualizing brand performance.
Input Type | Description |
---|
Data | Input dataset to transform |
Output
Output Type | Format | Description |
---|
Data | Table | Transformed data with grouped values |
Configuration Fields
Field Name | Description |
---|
Column Name | The name of the column where longtail values should be grouped. |
Allow List | List of allowed values. Any value in the column not in this list will be replaced. |
Replacement Value | The value used to replace entries not in the allow list (e.g., Others , Misc , Unknown ). |
product_id | product_name | brand_names |
---|
P001 | Smartphone | Apple, Samsung, Google |
P002 | Laptop | Dell, HP, Lenovo |
P003 | Headphones | Bose, Sony, Sennheiser |
P004 | TV | LG, Samsung, Sony |
P005 | Smartwatch | Fitbit, Garmin, Apple |
Sample Configuration
Field | Value |
---|
columnName | product_name |
allowList | Smartphone, Headphones |
replacementValue | Others |
Sample Output
product_id | product_name | brand_names |
---|
P001 | Smartphone | Apple, Samsung, Google |
P002 | Others | Dell, HP, Lenovo |
P003 | Headphones | Bose, Sony, Sennheiser |
P004 | Others | LG, Samsung, Sony |
P005 | Others | Fitbit, Garmin, Apple |
In the above example, only Smartphone
and Headphones
were part of the allow list. All other values in the product_name
column were replaced with Others
.