String Value Matching
The String Value Matching in data quality involves comparing and validating text data to ensure accuracy and consistency across datasets. You use methods like exact matching, contains, and pattern recognition to detect discrepancies or variations. This process helps you maintain reliable and standardized data for analysis and reporting.
Rule configuration
A value is marked as a success when it matches the expected distinct set and complies with the match type. If the value is unique and fits within the defined set, the rule is considered passed.
Match type This configuration defines how the values in the dataset should be compared. It sets the criteria or condition under which a match is considered valid, such as exact matching, partial matching, or pattern-based matching.
Exact Match
Starts and Ends with
Contains
Pattern string A pattern string is a sequence of characters that defines a specific format or structure. It is often used for searching, matching, or validating text based on predefined rules or patterns.
Success criteria
- The success condition depends on how the
Match Type
is configured. - For example, “Alice” should be distinct. This means no two names in the database can have the same “Alice” value.
Configuration fields
-
Operator options
Greater than
Less than
Equal to
Between
(requires specifying a start and end range) -
Operator defines the comparison operation. You can select Greater Than, Less Than, Equal To, or Between.
-
Value is the threshold value used for success criteria. It is required for the
Greater than
,Less than
, andEqual to
operators. -
Value range is required only when the
Between
operator is selected. You must specify thestart
andend
range. -
Threshold type indicates whether the
Value
orValue Range
should be considered as a percentage or an absolute count. -
Allow null values determines if null values are permitted.
-
Check for match determines if data values align with predefined standards, formats, or reference values to ensure accuracy, consistency, and integrity.
Sample Input
ID | Customer | Country |
---|---|---|
1 | Fallon | Great Britain |
2 | Franklyn Fryer | France |
3 | Kathleen | United States |
4 | Judie Green | France |
5 | John Doe |
Sample rule configuration
- Match type Starts and Ends With
- Starts and Ends string Customer = Fallon,John Doe Country =Britain,United States,France
- Case sensitive True
Sample success criteria configuration
- Operator Greater than
- Value 50%
- Threshold type Absolute Count
- Allow null values True
- Check for match False
Sample Output
Column Name | Rule Name | Success Count | Failure Count | Within Threshold | Null Count |
---|---|---|---|---|---|
Customer | Distinct Value Set check | 2 | 3 | No | 0 |
Country | Distinct Value Set check | 4 | 1 | Yes | 1 |