Fixed Length File Parsing

Description

The fixed-length file parsing is designed to process files where each field occupies a fixed number of characters, with data extracted based on Start Index and Length. Unlike delimited formats, it does not rely on separators such as commas or tabs, making it highly reliable for files where spacing and alignment are part of the structure. By using a predefined schema, it ensures consistent and accurate parsing even when fields contain spaces, leading zeros, or special characters.

This type of parser is particularly useful for handling structured, legacy data formats such as mainframe exports, banking transaction records, and government datasets. It supports precise data extraction, validation, and conversion into structured formats like JSON or database tables. With its ability to maintain positional integrity, the fixed-length file parser enables seamless integration into modern data pipelines and facilitates tasks like data cleaning, transformation, and reporting.

This activity is especially useful in cases where:

Batch processing – Handles large files with consistent field positions efficiently.
Data cleaning and validation – Ensures field-level accuracy, trims extra spaces, and checks for format compliance.
Archival data retrieval – Reads historical datasets stored in fixed-width text files for re-analysis or migration.

By reading structured data from files with predefined field widths, this activity ensures precise extraction, preserves positional integrity, and streamlines integration into modern data workflows.

Tip: Ensure the field lengths and positions in the configuration match the file structure to avoid misaligned data.

Input

Field	Description
File	Fixed-width text file containing records from the previous activity.

Output

Field	Description
Data	Structured data extracted from fixed positions, formatted into a table.

Configuration Fields

Field	Description
Add Columns	Defines the fields to extract from the fixed-length file by specifying each column’s name, starting position, and character length.
File Encoding	Specifies the character set used to read the input file correctly.
Line Separator	Defines the character(s) that mark the end of each record in the file.
Max row lookup count	Limits the number of rows processed during data preview or validation, defaulting to 10000.

Each Add Columns item includes:

Subfield	Description
Columns Name	Specifies the identifier for each extracted field, used to label and reference data within the parsed output.
Start Index	Indicates the position where a field begins within each record.
Length	Specifies the number of characters to read for a field.

Sample Input

Not Applicable

Sample Configuration

Field	Value
`Column Name`	`Column1`,`Column2`,`Column3`
`Start Index`	`0`,`41`,`101`
`Length`	`40`,`60`,`70`
`File Encoding`	`Windows 1252`
`Line Separator`	`\r\n`
`Max row lookup count`	`10000`

Sample Output

column1	column2	column3
CustomerID123456	Transaction000001	Mumbai, Maharashtra, India
CustomerID789012	Transaction000002	New Delhi, Delhi, India
CustomerID345678	Transaction000003	Kolkata, West Bengal, India
CustomerID901234	Transaction000004	Chennai, Tamil Nadu, India