Skip to content

Extract HTML

Description

The Extract HTML activity extracts tabular data from HTML files and converts it into a structured dataset. This is especially useful for processing reports, web-scraped content, or embedded tables from web pages or system-generated HTML files.

Use case:
Ideal for scenarios where data is embedded in HTML tables, such as downloaded web reports, email digests, or content management system exports.


Input

TypeDescription
FileHTML document (.html, .htm)

Output

TypeDescription
DataStructured tabular data extracted from HTML

Configuration Fields

Field NameRequiredDescription
Add HTML ExtractYesDefines extraction rule(s) to identify and parse one or more HTML tables.

Sample Input

Not applicable — input is provided via uploaded HTML files.


Sample Configuration

FieldValue
Add HTML ExtractTable selector for parsing table

Sample Output

NameAgeCountry
John Doe28USA
Alice31Canada
Bob25Australia