Consistent Casing
In data quality, Consistent Casing refers to ensuring that text data is standardized in terms of letter casing (uppercase versus lowercase) across a dataset. This consistency is important for maintaining data integrity, as variations in casing (for example, “John Doe” versus “john doe” or “USA” versus “usa”) can lead to errors during data processing, matching, and analysis.
Rule configuration
A value is marked as a success when it matches the selected case type. If the value is unique and fits within the defined set, the rule is considered passed.
Case type In data quality, the case type refers to the specific formatting of text where letter casing is applied in distinct patterns. These patterns dictate how characters are capitalized or formatted in a string. They help ensure consistency and improve readability across datasets.
Upper Case
means all letters are capitalized.
Lower Case
means all letters are in lowercase.
Title Case
means the first letter of each word is capitalized.
Sentance Case
means only the first letter of the first word is capitalized.
camel Case
means the first word is lowercase, and each subsequent word starts with an uppercase letter without spaces.
Pascal Case
is similar to camel case, but the first letter of the first word is also capitalized.
Kabab Case
means words are in lowercase and separated by hyphens.
Snake Case
means words are in lowercase and separated by underscores.
Success criteria
The success condition depends on how the Case Type
is configured.
For example, when Case Type
is set to Pascal Case
, only inputs where each word starts with an uppercase letter are valid.
For example, “DropDown” is valid, but “dropDown” is not.
Configuration fields
-
Operator options
Greater than
Less than
Equal to
Between
(requires specifying a start and end range) -
Operator defines the comparison operation. It can be Greater Than, Less Than, Equal To, or Between.
-
Value is the threshold used for success criteria. It is required for the
Greater than
,Less than
, andEqual to
operators. -
Value range is required only when the
Between
operator is selected. You must specify a start and end range. -
Threshold type indicates whether the
Value
orValue Range
is considered as a percentage or an absolute count. -
Allow null values determines whether null values are permitted.
-
Check for match verifies if data values align with predefined standards, formats, or reference values. This helps ensure accuracy, consistency, and integrity.
Sample Input
ID | Customer | Country |
---|---|---|
1 | Fallon | greatBritain |
2 | FranklynFryer | France |
3 | Kathleen | unitedStates |
4 | JudieGreen | |
5 | JohnDoe | France |
Sample rule configuration
Case type Pascal Case
Sample success criteria configuration
- Operator Greater than
- Value 75%
- Threshold type Absolute Count
- Allow null values False
- Check for match True
Sample output
Column Name | Rule Name | Success Count | Failure Count | Within Threshold | Null Count |
---|---|---|---|---|---|
Customer | Consistent Casing check | 5 | 0 | Yes | 0 |
Country | Consistent Casing check | 2 | 3 | No | 1 |