Skip to content

Extract tables from files

Description

The Extract tables from files activity sends input files to the Infoveave AI service and extracts tabular information from them. It can process all files together or one file at a time. It can also guide extraction using a column alias map or a custom structured response schema.

Supported Features

  • File-based table extraction: Extract tables from one or more attached files.
  • One file at a time mode: Process each file independently and add a FileName column to extracted rows.
  • Combined file mode: Send all files in a single AI request.
  • Column alias map: Define expected output columns and aliases that may appear in source documents.
  • Generated schema: When structured output is disabled, the activity builds a response schema from the configured column map.
  • Custom structured output: When enabled, provide a custom JSON response schema.
  • Raw response retention: The raw AI response is returned in AdditionalResponse.

Input

TypeRequiredDescription
FilesOptionalFiles passed to the AI service as inline attachments. Each file is converted to base64 and sent with a detected MIME type.

Input Scenarios

1. No Files

The activity can run without files, but extraction normally requires attachments. If the AI service does not return rows, the activity fails.

{
"Files": []
}

2. Single File

{
"Files": [
{ "FileName": "invoice.pdf", "FullPath": "C:/Work/invoice.pdf" }
]
}

3. Multiple Files

{
"Files": [
{ "FileName": "invoice-001.pdf", "FullPath": "C:/Work/invoice-001.pdf" },
{ "FileName": "invoice-002.pdf", "FullPath": "C:/Work/invoice-002.pdf" }
]
}

Output

The activity reads the Response property from the AI JSON response and converts each array item into one data row.

FieldTypeDescription
DataArrayExtracted table rows. Column names depend on Column Map or the custom ResponseSchema.
ErrorsArrayParsing errors or execution errors, if any.
AdditionalResponseStringRaw AI response JSON. In one-file-at-a-time mode, this currently contains only the combined response variable and may be empty even when per-file responses succeeded.

Example Output

{
"Data": [
{
"InvoiceNumber": "INV-001",
"InvoiceDate": "2026-01-15",
"Amount": "1250.00",
"FileName": "invoice-001.pdf"
},
{
"InvoiceNumber": "INV-002",
"InvoiceDate": "2026-01-20",
"Amount": "980.00",
"FileName": "invoice-002.pdf"
}
],
"Errors": []
}

Configuration Fields

Field NameTypeRequiredDescription
OneFileAtATimeBooleanNoWhen true, each input file is sent in a separate AI request. Extracted rows receive a FileName column. When false, all files are sent together in one request. Default is false.
Column MapObject ArrayNoList of target columns and aliases. Column is the output column to extract. Alias describes alternate names that may appear in the document. Used to build the extraction objective and generated response schema.
PromptTextYesUser instructions for extraction. This is inserted into the activity’s AI table-extraction template.
StructuredOutputBooleanNoWhen true, the activity uses the provided ResponseSchema. When false, it builds a schema from Column Map.
ResponseSchemaJSONConditionalCustom response schema used when StructuredOutput is true. The schema should return data under a Response property if the default parser is expected to produce rows.

Conditional Field Rendering Rules

  • ResponseSchema is shown when StructuredOutput is true.

Sample Configuration

Scenario 1: Extract Invoices With Generated Schema

FieldValue
OneFileAtATimetrue
Column MapInvoiceNumber = Invoice No, Bill No; InvoiceDate = Date, Bill Date; Amount = Total, Grand Total
PromptExtract invoice header fields from the attached files.
StructuredOutputfalse

Scenario 2: Use Custom Structured Output

FieldValue
OneFileAtATimefalse
PromptExtract all line items from the purchase order.
StructuredOutputtrue
ResponseSchema{ "type": "object", "properties": { "Response": { "type": "array", "items": { "type": "object" } } }, "required": ["Response"] }

Sample Output

{
"Data": [
{
"InvoiceNumber": "INV-001",
"InvoiceDate": "2026-01-15",
"Amount": "1250.00"
}
],
"Errors": [],
"AdditionalResponse": "{\"Response\":[{\"InvoiceNumber\":\"INV-001\",\"InvoiceDate\":\"2026-01-15\",\"Amount\":\"1250.00\"}]}"
}