asreview.models.feature_extraction.Doc2Vec

class asreview.models.feature_extraction.Doc2Vec(*args, vector_size=40, epochs=33, min_count=1, n_jobs=1, window=7, dm_concat=0, dm=2, dbow_words=0, **kwargs)[source]

Doc2Vec feature extraction technique (doc2vec).

Feature extraction technique provided by the gensim package. It takes relatively long to create a feature matrix with this method. However, this only has to be done once per simulation/review. The upside of this method is the dimension- reduction that generally takes place, which makes the modelling quicker.

Note

This feature extraction technique requires gensim to be installed. Use pip install gensim or install all optional ASReview dependencies with pip install asreview[all]

Parameters
  • vector_size (int) – Output size of the vector.

  • epochs (int) – Number of epochs to train the doc2vec model.

  • min_count (int) – Minimum number of occurences for a word in the corpus for it to be included in the model.

  • n_jobs (int) – Number of threads to train the model with.

  • window (int) – Maximum distance over which word vectors influence each other.

  • dm_concat (int) – Whether to concatenate word vectors or not. See paper for more detail.

  • dm (int) – Model to use. 0: Use distribute bag of words (DBOW). 1: Use distributed memory (DM). 2: Use both of the above with half the vector size and concatenate them.

  • dbow_words (int) – Whether to train the word vectors using the skipgram method.

Attributes

default_param

Get the default parameters of the model.

label

name

param

Get the (assigned) parameters of the model.

Methods

fit(texts)

Fit the model to the texts.

fit_transform(texts[, titles, abstracts, ...])

Fit and transform a list of texts.

full_hyper_space()

hyper_space()

transform(texts)

Transform a list of texts.