---
title: Extract HTML
description: Extract table data from HTML documents into structured datasets.
category: Data Extraction
tags: [html, table, extract, parse, structured data]
---

import { Aside } from "@astrojs/starlight/components";

# Extract HTML

## Description

The **Extract HTML** activity extracts tabular data from HTML files and converts it into a structured dataset. This is especially useful for processing reports, web-scraped content, or embedded tables from web pages or system-generated HTML files.

> **Use case**:  
> Ideal for scenarios where data is embedded in HTML tables, such as downloaded web reports, email digests, or content management system exports.

---

## Input

| Type | Description                 |
| ---- | --------------------------- |
| File | HTML document (.html, .htm) |

---

## Output

| Type | Description                                 |
| ---- | ------------------------------------------- |
| Data | Structured tabular data extracted from HTML |

---

## Configuration Fields

| Field Name           | Required | Description                                                               |
| -------------------- | -------- | ------------------------------------------------------------------------- |
| **Add HTML Extract** | Yes      | Defines extraction rule(s) to identify and parse one or more HTML tables. |

---

## Sample Input

Not applicable — input is provided via uploaded HTML files.

---

## Sample Configuration

| Field            | Value                            |
| ---------------- | -------------------------------- |
| Add HTML Extract | Table selector for parsing table |

---

## Sample Output

| Name     | Age | Country   |
| -------- | --- | --------- |
| John Doe | 28  | USA       |
| Alice    | 31  | Canada    |
| Bob      | 25  | Australia |

---

<Aside>
Extact html file connection

![alt text](extract-html.png)

</Aside>