Fixed Length File Parsing
Description
The fixed-length file parsing is designed to process files where each field occupies a fixed number of characters, with data extracted based on Start Index
and Length
. Unlike delimited formats, it does not rely on separators such as commas or tabs, making it highly reliable for files where spacing and alignment are part of the structure. By using a predefined schema, it ensures consistent and accurate parsing even when fields contain spaces, leading zeros, or special characters.
This type of parser is particularly useful for handling structured, legacy data formats such as mainframe exports, banking transaction records, and government datasets. It supports precise data extraction, validation, and conversion into structured formats like JSON or database tables. With its ability to maintain positional integrity, the fixed-length file parser enables seamless integration into modern data pipelines and facilitates tasks like data cleaning, transformation, and reporting.
This activity is especially useful in cases where:
- Batch processing – Handles large files with consistent field positions efficiently.
- Data cleaning and validation – Ensures field-level accuracy, trims extra spaces, and checks for format compliance.
- Archival data retrieval – Reads historical datasets stored in fixed-width text files for re-analysis or migration.
By reading structured data from files with predefined field widths, this activity ensures precise extraction, preserves positional integrity, and streamlines integration into modern data workflows.
Tip: Ensure the field lengths and positions in the configuration match the file structure to avoid misaligned data.
Input
Field | Description |
---|---|
File | Fixed-width text file containing records from the previous activity. |
Output
Field | Description |
---|---|
Data | Structured data extracted from fixed positions, formatted into a table. |
Configuration Fields
Field | Description |
---|---|
Add Columns | Defines the fields to extract from the fixed-length file by specifying each column’s name, starting position, and character length. |
File Encoding | Specifies the character set used to read the input file correctly. |
Line Separator | Defines the character(s) that mark the end of each record in the file. |
Max row lookup count | Limits the number of rows processed during data preview or validation, defaulting to 10000. |
Each Add Columns
item includes:
Subfield | Description |
---|---|
Columns Name | Specifies the identifier for each extracted field, used to label and reference data within the parsed output. |
Start Index | Indicates the position where a field begins within each record. |
Length | Specifies the number of characters to read for a field. |
Sample Input
Not Applicable
Sample Configuration
Field | Value |
---|---|
Column Name | Column1 ,Column2 ,Column3 |
Start Index | 0 ,41 ,101 |
Length | 40 ,60 ,70 |
File Encoding | Windows 1252 |
Line Separator | \r\n |
Max row lookup count | 10000 |
Sample Output
column1 | column2 | column3 |
---|---|---|
CustomerID123456 | Transaction000001 | Mumbai, Maharashtra, India |
CustomerID789012 | Transaction000002 | New Delhi, Delhi, India |
CustomerID345678 | Transaction000003 | Kolkata, West Bengal, India |
CustomerID901234 | Transaction000004 | Chennai, Tamil Nadu, India |