Skip to content

Fixed Length File Parsing

Description

The fixed-length file parsing is designed to process files where each field occupies a fixed number of characters, with data extracted based on Start Index and Length. Unlike delimited formats, it does not rely on separators such as commas or tabs, making it highly reliable for files where spacing and alignment are part of the structure. By using a predefined schema, it ensures consistent and accurate parsing even when fields contain spaces, leading zeros, or special characters.

This type of parser is particularly useful for handling structured, legacy data formats such as mainframe exports, banking transaction records, and government datasets. It supports precise data extraction, validation, and conversion into structured formats like JSON or database tables. With its ability to maintain positional integrity, the fixed-length file parser enables seamless integration into modern data pipelines and facilitates tasks like data cleaning, transformation, and reporting.

This activity is especially useful in cases where:

  • Batch processing – Handles large files with consistent field positions efficiently.
  • Data cleaning and validation – Ensures field-level accuracy, trims extra spaces, and checks for format compliance.
  • Archival data retrieval – Reads historical datasets stored in fixed-width text files for re-analysis or migration.

By reading structured data from files with predefined field widths, this activity ensures precise extraction, preserves positional integrity, and streamlines integration into modern data workflows.

Tip: Ensure the field lengths and positions in the configuration match the file structure to avoid misaligned data.


Input

FieldDescription
FileFixed-width text file containing records from the previous activity.

Output

FieldDescription
DataStructured data extracted from fixed positions, formatted into a table.

Configuration Fields

FieldDescription
Add ColumnsDefines the fields to extract from the fixed-length file by specifying each column’s name, starting position, and character length.
File EncodingSpecifies the character set used to read the input file correctly.
Line SeparatorDefines the character(s) that mark the end of each record in the file.
Max row lookup countLimits the number of rows processed during data preview or validation, defaulting to 10000.

Each Add Columns item includes:

SubfieldDescription
Columns NameSpecifies the identifier for each extracted field, used to label and reference data within the parsed output.
Start IndexIndicates the position where a field begins within each record.
LengthSpecifies the number of characters to read for a field.

Sample Input

Not Applicable


Sample Configuration

FieldValue
Column NameColumn1,Column2,Column3
Start Index0,41,101
Length40,60,70
File EncodingWindows 1252
Line Separator\r\n
Max row lookup count10000

Sample Output

column1column2column3
CustomerID123456Transaction000001Mumbai, Maharashtra, India
CustomerID789012Transaction000002New Delhi, Delhi, India
CustomerID345678Transaction000003Kolkata, West Bengal, India
CustomerID901234Transaction000004Chennai, Tamil Nadu, India