---
title: Generate big data
description: Expand datasets by replicating and varying rows based on user configuration.
category: Data Transformation
tags: [scaling, synthetic data, testing, simulation, replication]
---

# Generate big data

## Description

The **Generate Big Data** activity expands an existing dataset by multiplying each row a specified number of times using an **Expansion Factor**. This can be used to simulate large datasets for stress testing, machine learning training, or prototyping.

Each replicated row may contain subtle variations for realism, and the **Key Column** ensures uniqueness and traceability across the newly generated data.

If the number of input rows is `a` and the expansion factor is `b`, the total output will be `a × b` rows.

---

## Input

| Type | Description                |
| ---- | -------------------------- |
| Data | Tabular dataset to expand. |

---

## Output

| Type             | Description                                      |
| ---------------- | ------------------------------------------------ |
| Transformed Data | Original and/or new rows based on configuration. |

---

## Configuration Fields

| Field Name           | Required | Description                                                                                               |
| -------------------- | -------- | --------------------------------------------------------------------------------------------------------- |
| **Expansion Factor** | Yes      | Multiplier to determine how many times each row should be repeated.                                       |
| **Key Column**       | Yes      | Column that uniquely identifies each original row. It is modified to ensure uniqueness in generated rows. |
| **Include Original** | No       | If enabled, original data is included in the output. Otherwise, only synthetic rows are included.         |

---

## Sample Input

| Transaction ID | Product | Price | Quantity |
| -------------- | ------- | ----- | -------- |
| 101            | Laptop  | 800   | 1        |
| 102            | Phone   | 500   | 2        |
| 103            | Tablet  | 300   | 1        |

---

## Sample Configuration

| **Field**            | **Value**        |
| -------------------- | ---------------- |
| **Expansion Factor** | 2                |
| **Key Column**       | `Transaction ID` |
| **Include Original** | Enabled          |

<!-- ![alt text](generate-bigdata-img.png) -->

---

## Sample Output (with Expansion Factor = 2)

| Transaction ID | Product | Price | Quantity |
| -------------- | ------- | ----- | -------- |
| 101            | Laptop  | 800   | 1        |
| 102            | Phone   | 500   | 2        |
| 103            | Tablet  | 300   | 1        |
| 104            | Laptop  | 400   | 2        |
| 105            | Tablet  | 358   | 1        |
| 106            | Phone   | 462   | 2        |

> Values may vary slightly in generated rows to simulate realistic data distributions.

---

## Notes

- The key column values in new rows are auto-incremented or mutated to ensure uniqueness.
- If **Include Original** is **disabled**, only synthetic (expanded) rows are returned.
- Generated rows may include noise or variations depending on underlying implementation.
