Skip to content

Normalize columns

Description

The Normalize Columns activity allows you to apply one or more normalization techniques to selected text columns within a dataset. This is useful for improving data quality and consistency across structured text inputs.

You can configure each column with its own normalization method, such as trimming spaces, standardizing capitalization, removing special characters, and more.

Use this activity to:

  • Standardize inconsistent text formats (e.g., names, categories)
  • Prepare data for matching, lookups, or deduplication
  • Clean imported datasets for better presentation or downstream processing

Use case:
Normalize names, job titles, and category fields in user-uploaded datasets for consistency before mapping them to master data or applying classification rules.

Input

Input TypeStatus
DataRequired

Output

Output TypeFormatDescription
DataTableData with normalized column values

Configuration Fields

Field NameDescription
Column MapSpecify columns and assign one or more normalization techniques to each.
Include OriginalIf enabled, both original and transformed columns are included in output. If disabled, only the normalized columns are returned.

Normalization Options

  • Convert to lowercase – e.g., “John” → “john”
  • Convert to uppercase – e.g., “John” → “JOHN”
  • Convert to title case – e.g., “john doe” → “John Doe”
  • Capitalize first letter – e.g., “software engineer” → “Software engineer”
  • Trim whitespace – Removes leading and trailing spaces
  • Remove whitespace – Deletes all whitespace characters
  • Remove special characters – Strips out non-alphanumeric characters (e.g., &, %, #)
  • Normalize accents – Replaces accented characters with their unaccented equivalents (e.g., “é” → “e”)

Sample Input

IDNameDescriptionCategory
1jOHN DOESoftware EngineerIT & Tech
2jane SMITHData ScientistAnalytics
3mark_o’learyMachine LearningAI & ML

Sample Configuration

ColumnNormalization Applied
NameTitle Case
DescriptionUppercase
CategoryRemove Special Characters, Title Case

Include Original: Enabled

Sample Output

IDNameDescriptionCategory
1John DoeSOFTWARE ENGINEERIT Tech
2Jane SmithDATA SCIENTISTAnalytics
3Mark O’LearyMACHINE LEARNINGAI ML