---
title: Remove duplicate rows
description: Eliminate duplicate records from a dataset by comparing values in a specified column.
category: Data Transformation
tags: [deduplicate, clean, remove duplicates, filter]
---

# Remove duplicate rows

## Description

The **Remove Duplicate Rows** activity filters out duplicate entries from a dataset by evaluating values in a specified column. It preserves the **first occurrence** of each unique value and removes all subsequent duplicates, helping to clean and normalize data for further processing.

This is particularly useful in scenarios where data merging or imports may have resulted in duplicate records, and only the most relevant or earliest instance should be retained.

Use this activity to:

- Clean data before analysis or export
- Remove redundant records during ETL processing
- Prepare datasets for machine learning or reporting by ensuring uniqueness

> **Use case**:
> In a CRM export where customers may appear multiple times due to recent activity, use this activity to deduplicate by **Email** or **Customer ID**, retaining only the earliest instance.

## Input

| Type     | Status   |
| -------- | -------- |
| **Data** | Required |

## Output

| Output Type | Format | Description                                                                  |
| ----------- | ------ | ---------------------------------------------------------------------------- |
| **Data**    | Table  | The cleaned dataset with only the first instance of each duplicate retained. |

## Configuration Fields

| Field Name      | Description                                                                                                                                          |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Column Name** | The column based on which duplicate detection is performed. If two or more rows share the same value in this column, only the first row is retained. |

> If the column contains empty or null values, those rows are not treated as duplicates of each other unless the values are exactly identical.

## Sample Input

| ID  | Name  | Age | City     |
| --- | ----- | --- | -------- |
| 101 | John  | 25  | New York |
| 102 | Alice | 30  | Chicago  |
| 103 | John  | 25  | New York |
| 104 | Bob   | 40  | Boston   |
| 105 | Alice | 30  | Chicago  |

> In this example, rows 103 and 105 are duplicates based on the **Name** column.

## Sample Configuration

| Field       | Value |
| ----------- | ----- |
| Column Name | Name  |

## Sample Output

| ID  | Name  | Age | City     |
| --- | ----- | --- | -------- |
| 101 | John  | 25  | New York |
| 102 | Alice | 30  | Chicago  |
| 104 | Bob   | 40  | Boston   |

> The duplicate rows for **John** and **Alice** were removed, keeping only the first occurrence based on their appearance in the input data.
