asreview.models.feature_extraction.Doc2Vec

class asreview.models.feature_extraction.Doc2Vec(*args, vector_size=40, epochs=33, min_count=1, n_jobs=1, window=7, dm_concat=0, dm=2, dbow_words=0, **kwargs)[source]

Doc2Vec feature extraction technique (doc2vec).

Feature extraction technique provided by the gensim package. It takes relatively long to create a feature matrix with this method. However, this only has to be done once per simulation/review. The upside of this method is the dimension- reduction that generally takes place, which makes the modelling quicker.

Note

This feature extraction technique requires gensim to be installed. Use pip install asreview[gensim] or install all optional ASReview dependencies with pip install asreview[all]

Parameters:

vector_size (int) – Output size of the vector.
epochs (int) – Number of epochs to train the doc2vec model.
min_count (int) – Minimum number of occurences for a word in the corpus for it to be included in the model.
n_jobs (int) – Number of threads to train the model with.
window (int) – Maximum distance over which word vectors influence each other.
dm_concat (int) – Whether to concatenate word vectors or not. See paper for more detail.
dm (int) – Model to use. 0: Use distribute bag of words (DBOW). 1: Use distributed memory (DM). 2: Use both of the above with half the vector size and concatenate them.
dbow_words (int) – Whether to train the word vectors using the skipgram method.

Attributes

`default_param`	Get the default parameters of the model.
`label`
`name`
`param`	Get the (assigned) parameters of the model.

Methods

`fit`(texts)	Fit the model to the texts.
`fit_transform`(texts[, titles, abstracts, ...])	Fit and transform a list of texts.
`full_hyper_space`()
`hyper_space`()
`transform`(texts)	Transform a list of texts.