Description
The Generate Big Data activity expands an existing dataset by multiplying each row a specified number of times using an Expansion Factor. This can be used to simulate large datasets for stress testing, machine learning training, or prototyping.
Each replicated row may contain subtle variations for realism, and the Key Column ensures uniqueness and traceability across the newly generated data.
If the number of input rows is a
and the expansion factor is b
, the total output will be a × b
rows.
Type | Description |
---|
Data | Tabular dataset to expand. |
Output
Type | Description |
---|
Transformed Data | Original and/or new rows based on configuration. |
Configuration Fields
Field Name | Required | Description |
---|
Expansion Factor | Yes | Multiplier to determine how many times each row should be repeated. |
Key Column | Yes | Column that uniquely identifies each original row. It is modified to ensure uniqueness in generated rows. |
Include Original | No | If enabled, original data is included in the output. Otherwise, only synthetic rows are included. |
Transaction ID | Product | Price | Quantity |
---|
101 | Laptop | 800 | 1 |
102 | Phone | 500 | 2 |
103 | Tablet | 300 | 1 |
Sample Configuration
Field | Value |
---|
Expansion Factor | 2 |
Key Column | Transaction ID |
Include Original | Enabled |
Sample Output (with Expansion Factor = 2)
Transaction ID | Product | Price | Quantity |
---|
101 | Laptop | 800 | 1 |
102 | Phone | 500 | 2 |
103 | Tablet | 300 | 1 |
104 | Laptop | 400 | 2 |
105 | Tablet | 358 | 1 |
106 | Phone | 462 | 2 |
Values may vary slightly in generated rows to simulate realistic data distributions.
Notes
- The key column values in new rows are auto-incremented or mutated to ensure uniqueness.
- If Include Original is disabled, only synthetic (expanded) rows are returned.
- Generated rows may include noise or variations depending on underlying implementation.