Distinct Value Set
The Distinct Value Set rule in data quality ensures that a specified data field contains only unique values. It verifies that each entry in the field is distinct. This maintains data accuracy and integrity. This rule helps prevent redundancy and ensures that the dataset remains reliable.
Rule configuration
A value is marked as a success when it matches the expected distinct set and complies with the match type. If the value is unique and fits within the defined set, the rule is considered passed.
Match type This configuration refers to the method used to compare values in a dataset. It defines how strict or flexible the comparison should be. It helps you determine uniqueness and data matching rules.
Contained by
Contains
Equals
Expected distinct set This configuration specifies the set of distinct values that the data is expected to contain. It helps you identify any discrepancies or missing values in the dataset.
Success criteria
The success criteria for the Distinct Value Set check in data quality are met when the number of distinct, non-duplicate values in the specified column matches or exceeds the threshold set in the expected distinct set. If the count of unique values meets the success criteria, the column is flagged as “withinThreshold”. This indicates that the data is in compliance with the rule’s requirements.
- The success condition depends on how the match type is configured.
- For example, “Alice” should be distinct. This means no two names in the database can have the same “Alice” value.
Configuration fields
-
Operator options
Greater than
Less than
Equal to
Between
(requires specifying a start and end range) -
Operator defines the comparison operation. You can use Greater Than, Less Than, Equal To, or Between.
-
Value is the threshold value used for success criteria. It is required for
Greater than
,Less than
, andEqual to
operators. -
Value range is required only when the
Between
operator is selected. You need to specify thestart
andend
range. -
Threshold type indicates whether the
Value
orValue Range
should be considered as a percentage or an absolute count. -
Allow null values determines if null values are permitted.
Sample Input
ID | Customer | Country |
---|---|---|
1 | Fallon | Great Britain |
2 | Franklyn | France |
3 | Kathleen | United States |
4 | Judie | France |
5 | Etta |
Sample rule configuration
- Match type Contains
- Expected distinct set Customer = Fallon,Kathleen,Judie Country =Brazil,United States
- Case sensitive True
Sample success criteria configuration
- Operator Greater than
- Value 50%
- Threshold type Absolute Count
Sample Output
Column Name | Rule Name | Success Count | Failure Count | Within Threshold | Null Count |
---|---|---|---|---|---|
Customer | Distinct Value Set check | 3 | 2 | Yes | 0 |
Country | Distinct Value Set check | 1 | 4 | No | 1 |