asreview.models.feature_extraction.Tfidf

class asreview.models.feature_extraction.Tfidf(*args, ngram_max=1, stop_words='english', **kwargs)[source]

TF-IDF feature extraction technique (tfidf).

Use the standard TF-IDF (Term Frequency-Inverse Document Frequency) feature extraction technique from SKLearn. Gives a sparse matrix as output. Works well in combination with asreview.models.classifiers.NaiveBayesClassifier and other fast training models (given that the features vectors are relatively wide).

Parameters:
  • ngram_max (int) – Can use up to ngrams up to ngram_max. For example in the case of ngram_max=2, monograms and bigrams could be used.

  • stop_words (str) – When set to ‘english’, use stopwords. If set to None or ‘none’, do not use stop words.

Attributes

default_param

Get the default parameters of the model.

label

name

param

Get the (assigned) parameters of the model.

Methods

fit(texts)

Fit the model to the texts.

fit_transform(texts[, titles, abstracts, ...])

Fit and transform a list of texts.

full_hyper_space()

hyper_space()

transform(texts)

Transform a list of texts.