asreview.models.feature_extraction.Doc2Vec

class asreview.models.feature_extraction.Doc2Vec(*args, vector_size=40, epochs=33, min_count=1, n_jobs=1, window=7, dm_concat=0, dm=2, dbow_words=0, **kwargs)[source]

Doc2Vec feature extraction technique (doc2vec).

Feature extraction technique provided by the gensim package. It takes relatively long to create a feature matrix with this method. However, this only has to be done once per simulation/review. The upside of this method is the dimension- reduction that generally takes place, which makes the modelling quicker.

Note

This feature extraction technique requires gensim to be installed. Use pip install asreview[gensim] or install all optional ASReview dependencies with pip install asreview[all]

Parameters:
  • vector_size (int) – Output size of the vector.

  • epochs (int) – Number of epochs to train the doc2vec model.

  • min_count (int) – Minimum number of occurences for a word in the corpus for it to be included in the model.

  • n_jobs (int) – Number of threads to train the model with.

  • window (int) – Maximum distance over which word vectors influence each other.

  • dm_concat (int) – Whether to concatenate word vectors or not. See paper for more detail.

  • dm (int) – Model to use. 0: Use distribute bag of words (DBOW). 1: Use distributed memory (DM). 2: Use both of the above with half the vector size and concatenate them.

  • dbow_words (int) – Whether to train the word vectors using the skipgram method.

Attributes

default_param

Get the default parameters of the model.

label

name

param

Get the (assigned) parameters of the model.

Methods

fit(texts)

Fit the model to the texts.

fit_transform(texts[, titles, abstracts, ...])

Fit and transform a list of texts.

full_hyper_space()

hyper_space()

transform(texts)

Transform a list of texts.