Skip to content

Extract Ngrams

Description

This activity extracts n-grams (sequences of elements as single elements) from a specified column based on user specifications. Users can define the n-gram size, output format, and additional processing options such as removing stop words, stemming, and sorting.

Input

Data only

Output

Transformed data

Configuration Fields

  • Column To Extract The column containing the text data from which n-grams will be extracted (Previous Data Column Editor).
  • Output Method The method of output (Options One per row, One per column, JSON).
  • Output Column The column where the extracted n-grams will be stored.
  • Include Original If enabled, includes the columns from the previous activity; otherwise, only the output column is retained.
  • Size Size of n-grams.
  • Clear Stop Words Whether to remove common stop words.
  • Stem Words Whether to stem the extracted n-grams.
  • Sort Words Whether to sort the extracted n-grams.

Sample Input

ReviewIDReviewTextRatingDateReviewer
101The product quality is amazing52024-02-01Alice
102Great service and fast delivery42024-02-02Bob
103The material is poor and fragile22024-02-03Charlie
104Excellent support and great help52024-02-04David
105Delivery was slow, but good item32024-02-05Emma

Sample Configuration

alt text

Sample Output

ExtractedNgrams
”product quality"
"quality amazing"
"Great service"
"service fast"
"material poor"
"poor fragile"
"Excellent support"
"support great"
"Delivery slow"
"slow good”