Extract Ngrams
Description
This activity extracts n-grams (sequences of elements as single elements) from a specified column based on user specifications. Users can define the n-gram size, output format, and additional processing options such as removing stop words, stemming, and sorting.
Input
Data only
Output
Transformed data
Configuration Fields
- Column To Extract The column containing the text data from which n-grams will be extracted (Previous Data Column Editor).
- Output Method The method of output (Options One per row, One per column, JSON).
- Output Column The column where the extracted n-grams will be stored.
- Include Original If enabled, includes the columns from the previous activity; otherwise, only the output column is retained.
- Size Size of n-grams.
- Clear Stop Words Whether to remove common stop words.
- Stem Words Whether to stem the extracted n-grams.
- Sort Words Whether to sort the extracted n-grams.
Sample Input
ReviewID | ReviewText | Rating | Date | Reviewer |
---|---|---|---|---|
101 | The product quality is amazing | 5 | 2024-02-01 | Alice |
102 | Great service and fast delivery | 4 | 2024-02-02 | Bob |
103 | The material is poor and fragile | 2 | 2024-02-03 | Charlie |
104 | Excellent support and great help | 5 | 2024-02-04 | David |
105 | Delivery was slow, but good item | 3 | 2024-02-05 | Emma |
Sample Configuration
Sample Output
ExtractedNgrams |
---|
”product quality" |
"quality amazing" |
"Great service" |
"service fast" |
"material poor" |
"poor fragile" |
"Excellent support" |
"support great" |
"Delivery slow" |
"slow good” |