---
title: Group longtail values
description: Group infrequent or non-priority values in a column into a single replacement value.
category: Data Transformation
tags: [transformation, grouping, longtail, normalization, data clean-up]
---

# Group longtail values

## Description

The **Group Longtail Values** activity helps streamline datasets by consolidating lesser-used, low-frequency, or non-priority values in a column into a single replacement value.

It is often used to reduce category fragmentation and simplify downstream analysis by focusing on the most relevant or allowed values and grouping the remaining entries under a common label (e.g., `Others`).

Use this activity to:

- Clean and normalize long-tail categorical data
- Replace values not in an allow-list with a defined label
- Focus analysis on key brands, categories, or terms
- Minimize noise from low-frequency entries in visualization or reporting

> **Use case**:  
> A dataset contains numerous product brands, many of which appear only once or twice. To improve chart readability, you can group all brands not in the top 3 (`Apple`, `Samsung`, `Google`) as `Others`, using this activity before visualizing brand performance.

## Input

| Input Type | Description                |
| ---------- | -------------------------- |
| **Data**   | Input dataset to transform |

## Output

| Output Type | Format | Description                          |
| ----------- | ------ | ------------------------------------ |
| **Data**    | Table  | Transformed data with grouped values |

## Configuration Fields

| Field Name            | Description                                                                                  |
| --------------------- | -------------------------------------------------------------------------------------------- |
| **Column Name**       | The name of the column where longtail values should be grouped.                              |
| **Allow List**        | List of allowed values. Any value in the column not in this list will be replaced.           |
| **Replacement Value** | The value used to replace entries not in the allow list (e.g., `Others`, `Misc`, `Unknown`). |

## Sample Input

| product_id | product_name | brand_names            |
| ---------- | ------------ | ---------------------- |
| P001       | Smartphone   | Apple, Samsung, Google |
| P002       | Laptop       | Dell, HP, Lenovo       |
| P003       | Headphones   | Bose, Sony, Sennheiser |
| P004       | TV           | LG, Samsung, Sony      |
| P005       | Smartwatch   | Fitbit, Garmin, Apple  |

## Sample Configuration

| Field              | Value                    |
| ------------------ | ------------------------ |
| `columnName`       | `product_name`           |
| `allowList`        | `Smartphone, Headphones` |
| `replacementValue` | `Others`                 |

## Sample Output

| product_id | product_name | brand_names            |
| ---------- | ------------ | ---------------------- |
| P001       | Smartphone   | Apple, Samsung, Google |
| P002       | Others       | Dell, HP, Lenovo       |
| P003       | Headphones   | Bose, Sony, Sennheiser |
| P004       | Others       | LG, Samsung, Sony      |
| P005       | Others       | Fitbit, Garmin, Apple  |

> In the above example, only `Smartphone` and `Headphones` were part of the allow list. All other values in the `product_name` column were replaced with `Others`.
