Simulation via command line#

ASReview LAB comes with a command line interface for simulating the performance of ASReview algorithm.

Getting started#

The simulation command line tool can be accessed directly like:

asreview simulate MY_DATASET.csv -o MY_SIMULATION.asreview

This performs a simulation with the default active learning model, where MY_DATASET.csv is the path to the Fully labeled data you want to simulate. The result of the simulation is stored, after a successful simulation, at MY_SIMULATION.asreview where MY_SIMULATION is the filename you prefer and the extension is .asreview (ASReview project file extension).

Simulation progress#

The progress of the simulation is given with two progress bars. The top one is used to count the number of relevant records found. The bottom one monitors the number of records labeled. By default (see --n-stop), the simulation stops once the the top progress bar reaches 100%.

Relevant records found: 100%|█████████████████████████████████████████████████████████| 38/38 [00:04<00:00,  7.83it/s]
Records labeled       :   7%|███▊                                                  | 322/4544 [00:04<01:03, 66.37it/s]

Loss: 0.021
NDCG: 0.530

Command line arguments for simulating#

The command asreview simulate --help provides an overview of available arguments for the simulation. Each of the sections below describe the available arguments. The example below shows how you can set the command line arguments.

asreview simulate MY_DATASET.csv -o MY_SIMULATION.asreview -q max_random

Dataset#

dataset#: Required. File path or URL to the dataset or one of the SYNERGY datasets.

You can also use one of the SYNERGY dataset. Use the following command and replace DATASET_ID by the dataset ID.

asreview simulate synergy:DATASET_ID

For example:

asreview simulate synergy:van_de_schoot_2018 -o myreview.asreview

Active learning#

--ai AI#: The AI to simulate with. Default is elas_u4.

-c, --classifier CLASSIFIER#: The classifier for active learning. Default is Naive Bayes (nb).

-q, --querier QUERIER#: The querier for active learning. Default is Maximum (max).

-b, --balancer BALANCER#: Data rebalancing strategy mainly for RNN methods. Helps against imbalanced datasets with few inclusions and many exclusions. Default is balanced.

-e, --feature-extractor FEATURE_EXTRACTOR#: Feature extraction algorithm. Some combinations of feature extractors and classifiers are not supported or feasible. Default is TF-IDF (tfidf).

--seed SEED#: Seed for the model (classifiers, balance strategies, feature extraction techniques, and query strategies).

--prior-seed PRIOR_SEED#: Seed for selecting prior records if the --prior-idx option is not used. If the option --prior-idx is used with one or more indices, this option is ignored.

--embedding EMBEDDING_FP#: File path of embedding matrix. Required for LSTM models.

Prior knowledge#

By default, the model initializes with no prior included or excluded records. You can set the number of priors by --n-prior-included and --n-prior-excluded. Alternatively, you can initialize your model with a specific set of starting papers using --prior-idx or --prior-record-id to select the indices or record IDs of the papers you want to start the simulation with.

The following options can be used to label prior knowledge:

--n-prior-included N_PRIOR_INCLUDED#: Sample n prior included records. Only used when --prior-idx is not given. Default 0.

--n-prior-excluded N_PRIOR_EXCLUDED#: Sample n prior excluded records. Only used when --prior-idx is not given. Default 0.

--prior-idx [PRIOR_IDX [PRIOR_IDX ...]]#: Prior indices by row number (row numbers start at 0).

--prior-record-id [PRIOR_RECORD_ID [PRIOR_RECORD_ID ...]]#: Prior indices by record ID.

Simulation setup#

--n-query N_QUERY#: Number of records queried each query. Default 1.

--n-stop N_STOP#: The number of label actions to simulate. If not set, simulation stops after the last relevant record is found. Use -1 to simulate all label actions.

--config-file CONFIG_FILE#: Configuration file for the learning cycle.

Results#

--output OUTPUT, -o OUTPUT#: Location to ASReview project file of simulation.

--verbose VERBOSE, -v VERBOSE#: Verbosity level.

Algorithms#

The command line interface provides an easy way to get an overview of all available active learning model elements (classifiers, query strategies, balance strategies, and feature extraction algorithms) and their names for command line usage in ASReview LAB. The following command lists the available models:

asreview algorithms

The command includes models added via Developing Extensions. See Developing Extensions for more information on developing new models and install them via extensions.

Use pip install asreview-dory to get access to all Dory models. The Dory extension contains a collection of New and Exciting MOdels.