Description
The Normalize Columns activity allows you to apply one or more normalization techniques to selected text columns within a dataset. This is useful for improving data quality and consistency across structured text inputs.
You can configure each column with its own normalization method, such as trimming spaces, standardizing capitalization, removing special characters, and more.
Use this activity to:
- Standardize inconsistent text formats (e.g., names, categories)
- Prepare data for matching, lookups, or deduplication
- Clean imported datasets for better presentation or downstream processing
Use case:
Normalize names, job titles, and category fields in user-uploaded datasets for consistency before mapping them to master data or applying classification rules.
Input Type | Status |
---|
Data | Required |
Output
Output Type | Format | Description |
---|
Data | Table | Data with normalized column values |
Configuration Fields
Field Name | Description |
---|
Column Map | Specify columns and assign one or more normalization techniques to each. |
Include Original | If enabled, both original and transformed columns are included in output. If disabled, only the normalized columns are returned. |
Normalization Options
- Convert to lowercase – e.g., “John” → “john”
- Convert to uppercase – e.g., “John” → “JOHN”
- Convert to title case – e.g., “john doe” → “John Doe”
- Capitalize first letter – e.g., “software engineer” → “Software engineer”
- Trim whitespace – Removes leading and trailing spaces
- Remove whitespace – Deletes all whitespace characters
- Remove special characters – Strips out non-alphanumeric characters (e.g., &, %, #)
- Normalize accents – Replaces accented characters with their unaccented equivalents (e.g., “é” → “e”)
ID | Name | Description | Category |
---|
1 | jOHN DOE | Software Engineer | IT & Tech |
2 | jane SMITH | Data Scientist | Analytics |
3 | mark_o’leary | Machine Learning | AI & ML |
Sample Configuration
Column | Normalization Applied |
---|
Name | Title Case |
Description | Uppercase |
Category | Remove Special Characters, Title Case |
Include Original: Enabled
Sample Output
ID | Name | Description | Category |
---|
1 | John Doe | SOFTWARE ENGINEER | IT Tech |
2 | Jane Smith | DATA SCIENTIST | Analytics |
3 | Mark O’Leary | MACHINE LEARNING | AI ML |