Skip to content

Generate big data

Description

The Generate Big Data activity expands an existing dataset by multiplying each row a specified number of times using an Expansion Factor. This can be used to simulate large datasets for stress testing, machine learning training, or prototyping.

Each replicated row may contain subtle variations for realism, and the Key Column ensures uniqueness and traceability across the newly generated data.

If the number of input rows is a and the expansion factor is b, the total output will be a × b rows.


Input

TypeDescription
DataTabular dataset to expand.

Output

TypeDescription
Transformed DataOriginal and/or new rows based on configuration.

Configuration Fields

Field NameRequiredDescription
Expansion FactorYesMultiplier to determine how many times each row should be repeated.
Key ColumnYesColumn that uniquely identifies each original row. It is modified to ensure uniqueness in generated rows.
Include OriginalNoIf enabled, original data is included in the output. Otherwise, only synthetic rows are included.

Sample Input

Transaction IDProductPriceQuantity
101Laptop8001
102Phone5002
103Tablet3001

Sample Configuration

FieldValue
Expansion Factor2
Key ColumnTransaction ID
Include OriginalEnabled

Sample Output (with Expansion Factor = 2)

Transaction IDProductPriceQuantity
101Laptop8001
102Phone5002
103Tablet3001
104Laptop4002
105Tablet3581
106Phone4622

Values may vary slightly in generated rows to simulate realistic data distributions.


Notes

  • The key column values in new rows are auto-incremented or mutated to ensure uniqueness.
  • If Include Original is disabled, only synthetic (expanded) rows are returned.
  • Generated rows may include noise or variations depending on underlying implementation.