Description
The Extract Ngrams activity processes text data by extracting sequences of words (n-grams) from a selected column. This is useful for tasks involving natural language processing (NLP), pattern recognition, or feature generation for machine learning.
N-grams are contiguous sequences of n items (typically words) in a sentence. For example:
- 2-gram (bigram): "great product"
- 3-gram (trigram): "fast delivery time"
This activity provides options to remove stop words, apply stemming, and sort terms before forming n-grams.
Use case:
Extracting bigrams or trigrams from customer reviews, survey responses, or feedback fields for sentiment analysis or topic modeling.
| Type | Description | 
|---|
| Data | Textual data containing the column to process | 
Output
| Type | Description | 
|---|
| Transformed Data | Table with extracted n-gram tokens | 
Configuration Fields
| Field Name | Required | Description | 
|---|
| Column To Extract | Yes | Name of the column containing text to extract n-grams from. (Uses Previous Data Column Editor) | 
| Output Method | Yes | Format for outputting extracted n-grams: One per rowOne per columnJSON
 | 
| Output Column | Yes | Name of the column to store the resulting n-grams | 
| Include Original | No | Whether to include the original columns in the output alongside the n-grams | 
| Size | Yes | Number of words per n-gram (e.g., 2 = bigram, 3 = trigram) | 
| Clear Stop Words | No | Remove common stop words (e.g., “the”, “is”, “and”) before generating n-grams | 
| Stem Words | No | Reduce words to their root form before generating n-grams (e.g., “running” → “run”) | 
| Sort Words | No | Sort words alphabetically within each n-gram (e.g., “great product” → “product great”) | 
| ReviewID | ReviewText | Rating | Date | Reviewer | 
|---|
| 101 | The product quality is amazing | 5 | 2024-02-01 | Alice | 
| 102 | Great service and fast delivery | 4 | 2024-02-02 | Bob | 
| 103 | The material is poor and fragile | 2 | 2024-02-03 | Charlie | 
| 104 | Excellent support and great help | 5 | 2024-02-04 | David | 
| 105 | Delivery was slow, but good item | 3 | 2024-02-05 | Emma | 
Sample Configuration
| Field | Value | 
|---|
| Column To Extract | ReviewText | 
| Output Method | One per row | 
| Output Column | ExtractedNgrams | 
| Include Original | No | 
| Size | 2 | 
| Clear Stop Words | Yes | 
| Stem Words | Yes | 
| Sort Words | No | 
Sample Output
| ExtractedNgrams | 
|---|
| product quality | 
| quality amazing | 
| great service | 
| service fast | 
| fast delivery | 
| material poor | 
| poor fragile | 
| excellent support | 
| support great | 
| great help | 
| delivery slow | 
| slow good | 
| good item | 
Use Sort Words and Stem Words options when generating normalized features for text clustering or classification.