Skip to content

String Value Matching

The String value matching in data quality involves comparing and validating text data to ensure accuracy and consistency across datasets. It uses methods like exact matching, contains, and pattern recognition to detect discrepancies or variations. This process helps maintain reliable and standardized data for analysis and reporting.

Rule configurations

A value is marked as a success when it matches the expected distinct set and complies with the match type. If the value is unique and fits within the defined set, the rule is considered passed

Match type This configuration defines how the values in the dataset should be compared, It defines the criteria or condition under which a match is considered valid, such as exact matching, partial matching, or pattern-based matching.

Exact Match

Starts and Ends with

Contains

Pattern string A pattern string is a sequence of characters that defines a specific format or structure, often used for searching, matching, or validating text based on predefined rules or patterns.

Success criteria

  • The success condition depends on how the Match Type is configured.
  • For example “Alice” should be distinct, meaning no two names in the database can have the same “Alice” value.

Configuration fields

  • Operator options

    Greater than

    Less than

    Equal to

    Between (requires specifying a start and end range)

  • Operator Defines the comparison operation (Greater Than, Less Than, Equal To, or Between).

  • Value The threshold value used for success criteria. Required for Greater than, Less than, and Equal to operators.

  • Value range Required only when the Between operator is selected, specifying the start and end range.

  • Threshold type Indicates whether the Value or Value Range to be considered as percentage or an absolute count.

  • Allow null values Determines if null values are permitted.

  • Check for match Determines if data values align with predefined standards, formats, or reference values to ensure accuracy, consistency, and integrity

Sample Input

IDCustomerCountry
1FallonGreat Britain
2Franklyn FryerFrance
3KathleenUnited States
4Judie GreenFrance
5John Doe

Sample rule configuration

  • Match type Starts and Ends With
  • Starts and Ends string Customer = Fallon,John Doe Country =Britain,United States,France
  • Case sensitive True

Sample success criteria configuration

  • Operator Greater than
  • Value 50%
  • Threshold type Absolute Count
  • Allow null values True
  • Check for match False

alt text

Sample output

Column NameRule NameSuccess CountFailure CountWithin ThresholdNull Count
CustomerDistinct Value Set check23No0
CountryDistinct Value Set check41Yes1