Skip to content

SQL Pattern Matching

The SQL pattern matching in data quality involves using SQL queries to identify and validate specific patterns in the data, such as phone numbers, emails, or dates. By using operators like LIKE, REGEXP, or SIMILAR TO, you ensure accuracy, consistency, and integrity of the data. This technique helps you detect errors or inconsistencies and plays a key role in data cleansing. It helps maintain high-quality datasets for analysis and decision-making.

Rule configuration

You use these rules to identify issues such as inconsistencies, inaccuracies, or missing data. One common type of rule is the SQL LIKE pattern, which specifies patterns. This rule helps you check whether data entries conform to expected formats, such as matching specific text patterns, numbers, or structures. It ensures that the data meets predefined standards and improves overall data quality.

SQL like patterns The SQL LIKE pattern is an operator used in SQL queries to search for a specified pattern in the data. You use wildcards like % (which represents any sequence of characters) and _ (which represents a single character) to find partial matches in string values.

Success criteria

  • The success condition depends on how you configure the SQL like pattern.
  • For example, using LIKE 'a%' finds all entries starting with the letter “a”. Using LIKE '%@gmail.com' matches all email addresses ending with “@gmail.com”.

Configuration fields

  • Operator options

    Greater than

    Less than

    Equal to

    Between (requires specifying a start and end range)

  • Operator defines the comparison operation. You can choose from Greater Than, Less Than, Equal To, or Between.

  • Value is the threshold used for the success criteria. It is required when you use the Greater Than, Less Than, or Equal To operators.

  • Value range is required only when you select the Between operator. You need to specify the start and end range.

  • Threshold type indicates whether the Value or Value Range should be considered as a percentage or an absolute count.

  • Allow null values determines if null values are permitted.

  • Check for match determines if the data values align with predefined standards, formats, or reference values. This ensures accuracy, consistency, and integrity.

Sample Input

IDEmailName
1[email protected]Alice
2[email protected]Bob
3[email protected]Charlie
4[email protected]Dave
5[email protected]Eve

Sample rule configuration

  • Sql like Pattern
    SELECT * FROM users WHERE Email LIKE ’%@gmail.com’;

Sample success criteria configuration

  • Operator Greater than
  • Value 50%
  • Threshold type Absolute Count
  • Allow null values True
  • Check for match False

alt text

Sample Output

Column NameRule NameSuccess CountFailure CountWithin ThresholdNull Count
Emailsql pattern matching check32Yes0
Namesql pattern matching check05No0