Duplicate Record
The Duplicate Record Check ensures that a specific column does not contain duplicate values. This rule verifies whether the values in a given column meet the configured success criteria based on the uniqueness of the values.
Success criteria
The success criteria are evaluated based on the number of distinct values in the column.
- If the column has N rows, the number of distinct values is calculated.
- The success condition is met if the distinct value count satisfies the configured operator and value.
- For example, if the operator is Greater than and the value is 3, then the column must have more than 3 distinct values to be within the threshold.
Configuration fields
-
Operator options
Greater than
Less than
Equal to
Between
(requires specifying a start and end range) -
Operator defines the comparison operation. You can use Greater Than, Less Than, Equal To, or Between.
-
Value is the threshold value used for success criteria. It is required for
Greater than
,Less than
, andEqual to
operators. -
Value range is required only when the
Between
operator is selected. You need to specify thestart
andend
range. -
Threshold type indicates whether the
Value
orValue Range
should be considered as a percentage or an absolute count. -
Allow null values determines if null values are permitted.
Sample Input
ID | Name | Age |
---|---|---|
1 | Alice | 25 |
2 | Bob | 30 |
3 | Alice | 25 |
4 | Charlie | 40 |
5 | Alice | NULL |
Sample rule configuration
- Operator Greater than
- Value 3
- Threshold type Absolute Count
- Allow null values False
Sample Output
Column Name | Rule Name | Success Count | Within Threshold | Null Count |
---|---|---|---|---|
Name | Name Duplicate Check | 3 | No | 0 |
Age | Age Duplicate Check | 3 | No | 1 |