- class asreview.models.feature_extraction.Doc2Vec(*args, vector_size=40, epochs=33, min_count=1, n_jobs=1, window=7, dm_concat=0, dm=2, dbow_words=0, **kwargs)
Doc2Vec feature extraction technique (
Feature extraction technique provided by the gensim package. It takes relatively long to create a feature matrix with this method. However, this only has to be done once per simulation/review. The upside of this method is the dimension- reduction that generally takes place, which makes the modelling quicker.
This feature extraction technique requires
gensimto be installed. Use
pip install gensimor install all optional ASReview dependencies with
pip install asreview[all]
vector_size (int) – Output size of the vector.
epochs (int) – Number of epochs to train the doc2vec model.
min_count (int) – Minimum number of occurences for a word in the corpus for it to be included in the model.
n_jobs (int) – Number of threads to train the model with.
window (int) – Maximum distance over which word vectors influence each other.
dm_concat (int) – Whether to concatenate word vectors or not. See paper for more detail.
dm (int) – Model to use. 0: Use distribute bag of words (DBOW). 1: Use distributed memory (DM). 2: Use both of the above with half the vector size and concatenate them.
dbow_words (int) – Whether to train the word vectors using the skipgram method.
Get the default parameters of the model.
Get the (assigned) parameters of the model.
Fit the model to the texts.
fit_transform(texts[, titles, abstracts, ...])
Fit and transform a list of texts.
Transform a list of texts.