Skip to content

Numerical Binning

Description

The Numerical Binning activity transforms continuous numeric data into discrete intervals or “bins” using statistical or custom-defined rules. This technique is useful in data preprocessing, reporting, and visualization, where numeric precision can be generalized for pattern discovery or grouping.

Binning reduces complexity in datasets by segmenting a continuous range of values into a finite number of intervals. These intervals can be computed automatically using statistical rules or configured manually by specifying bin sizes and value ranges.

Use this activity to:

  • Convert continuous data into categorized groups
  • Preprocess numerical columns for ML models or dashboards
  • Highlight distribution patterns in reporting and visualization
  • Simplify data when precision is unnecessary

Use case: When analyzing customer age distributions, binning age into ranges such as “20–30”, “30–40”, and so on helps simplify reports. This transformation can be useful for segmentation in sales, surveys, or behavioral analytics.

Input

TypeStatusDescription
DataRequiredA dataset that includes at least one column containing numerical values to be binned.

Output

Output TypeFormatDescription
DataTableTransformed dataset with a new column containing bin labels.

Configuration Fields

Field NameDescription
ColumnSelect the column that contains numerical values to be grouped into bins. Only numerical data types are supported.
Binning ModeDetermines the method used to compute bin ranges. Choose from the following options:
- Sturges – Best for smaller datasets; calculates bins using the Sturges’ formula: k = ⌈log₂(n) + 1⌉ where n is the number of data points.
- Freedman Diaconis – Accounts for data variability and is robust to outliers. Bin width is calculated using the interquartile range (IQR).
- Scott – Minimizes binning error by selecting optimal bin width based on the standard deviation of the dataset.
- Square Root – Uses a simple rule: number of bins = √n. Useful for quick approximations.
- Fixed Size Intervals – Manually specify bin width and range. The system will use this size to segment the data evenly. Requires Minimum Value, Maximum Value, and Number Of Bins to be set.
- Custom – Manually define specific ranges or thresholds for each bin.
Number Of BinsApplicable only for certain modes like Fixed Size Intervals. Defines how many intervals the data should be divided into.
Minimum ValueVisible only for Fixed Size Intervals. The minimum starting value of the first bin. Values less than this will be excluded or grouped separately.
Maximum ValueVisible only for Fixed Size Intervals. The maximum value to be covered by the bins. Values exceeding this may be excluded or handled as overflow.
Output ColumnThe name of the new column that will contain the bin labels or categories (e.g., Age_Binned).
Include OriginalWhether to retain the original unbinned numerical column in the output:
  • Enabled – Retain the original column.
  • Disabled – Only include the binned output.

Sample Input

Age
23
35
47
59
72
89

Sample Configuration

FieldValue
ColumnAge
Binning ModeFixed Size Intervals
Number Of Bins4
Minimum Value20
Maximum Value100
Output ColumnAge_Binned
Include OriginalEnabled

If mode was set to Custom, you would instead define bin thresholds manually, e.g., 0–25, 25–50, 50–75, 75–100.

Sample Output

AgeAge_Binned
2320–40
3520–40
4740–60
5940–60
7260–80
8980–100