---
title: Extract tables from files
description: Extract tabular data from files using AI, with optional column aliases and custom structured output schema.
category: AI
tags: [ai, table extraction, documents, files, structured output]
---

# Extract tables from files

## **Description**

The **Extract tables from files** activity sends input files to the Infoveave AI service and extracts tabular information from them. It can process all files together or one file at a time. It can also guide extraction using a column alias map or a custom structured response schema.

### **Supported Features**

- **File-based table extraction**: Extract tables from one or more attached files.
- **One file at a time mode**: Process each file independently and add a `FileName` column to extracted rows.
- **Combined file mode**: Send all files in a single AI request.
- **Column alias map**: Define expected output columns and aliases that may appear in source documents.
- **Generated schema**: When structured output is disabled, the activity builds a response schema from the configured column map.
- **Custom structured output**: When enabled, provide a custom JSON response schema.
- **Raw response retention**: The raw AI response is returned in `AdditionalResponse`.

---

## **Input**

| Type | Required | Description |
| --- | --- | --- |
| Files | Optional | Files passed to the AI service as inline attachments. Each file is converted to base64 and sent with a detected MIME type. |

### **Input Scenarios**

#### **1. No Files**

The activity can run without files, but extraction normally requires attachments. If the AI service does not return rows, the activity fails.

```json
{
  "Files": []
}
```

#### **2. Single File**

```json
{
  "Files": [
    { "FileName": "invoice.pdf", "FullPath": "C:/Work/invoice.pdf" }
  ]
}
```

#### **3. Multiple Files**

```json
{
  "Files": [
    { "FileName": "invoice-001.pdf", "FullPath": "C:/Work/invoice-001.pdf" },
    { "FileName": "invoice-002.pdf", "FullPath": "C:/Work/invoice-002.pdf" }
  ]
}
```

---

## **Output**

The activity reads the `Response` property from the AI JSON response and converts each array item into one data row.

| Field | Type | Description |
| --- | --- | --- |
| Data | Array | Extracted table rows. Column names depend on `Column Map` or the custom `ResponseSchema`. |
| Errors | Array | Parsing errors or execution errors, if any. |
| AdditionalResponse | String | Raw AI response JSON. In one-file-at-a-time mode, this currently contains only the combined response variable and may be empty even when per-file responses succeeded. |

### **Example Output**

```json
{
  "Data": [
    {
      "InvoiceNumber": "INV-001",
      "InvoiceDate": "2026-01-15",
      "Amount": "1250.00",
      "FileName": "invoice-001.pdf"
    },
    {
      "InvoiceNumber": "INV-002",
      "InvoiceDate": "2026-01-20",
      "Amount": "980.00",
      "FileName": "invoice-002.pdf"
    }
  ],
  "Errors": []
}
```

---

## **Configuration Fields**

| Field Name | Type | Required | Description |
| --- | --- | --- | --- |
| OneFileAtATime | Boolean | No | When `true`, each input file is sent in a separate AI request. Extracted rows receive a `FileName` column. When `false`, all files are sent together in one request. Default is `false`. |
| Column Map | Object Array | No | List of target columns and aliases. `Column` is the output column to extract. `Alias` describes alternate names that may appear in the document. Used to build the extraction objective and generated response schema. |
| Prompt | Text | Yes | User instructions for extraction. This is inserted into the activity's AI table-extraction template. |
| StructuredOutput | Boolean | No | When `true`, the activity uses the provided `ResponseSchema`. When `false`, it builds a schema from `Column Map`. |
| ResponseSchema | JSON | Conditional | Custom response schema used when `StructuredOutput` is `true`. The schema should return data under a `Response` property if the default parser is expected to produce rows. |

### **Conditional Field Rendering Rules**

- **ResponseSchema** is shown when **StructuredOutput** is `true`.

---

## **Sample Configuration**

### **Scenario 1: Extract Invoices With Generated Schema**

| Field | Value |
| --- | --- |
| OneFileAtATime | `true` |
| Column Map | `InvoiceNumber = Invoice No, Bill No`; `InvoiceDate = Date, Bill Date`; `Amount = Total, Grand Total` |
| Prompt | `Extract invoice header fields from the attached files.` |
| StructuredOutput | `false` |

### **Scenario 2: Use Custom Structured Output**

| Field | Value |
| --- | --- |
| OneFileAtATime | `false` |
| Prompt | `Extract all line items from the purchase order.` |
| StructuredOutput | `true` |
| ResponseSchema | `{ "type": "object", "properties": { "Response": { "type": "array", "items": { "type": "object" } } }, "required": ["Response"] }` |

---

## **Sample Output**

```json
{
  "Data": [
    {
      "InvoiceNumber": "INV-001",
      "InvoiceDate": "2026-01-15",
      "Amount": "1250.00"
    }
  ],
  "Errors": [],
  "AdditionalResponse": "{\"Response\":[{\"InvoiceNumber\":\"INV-001\",\"InvoiceDate\":\"2026-01-15\",\"Amount\":\"1250.00\"}]}"
}
```
