Simulation via command line =========================== ASReview LAB comes with a command line interface for simulating the performance of ASReview algorithm. .. _simulation-cli-getting-started: Getting started --------------- The simulation command line tool can be accessed directly like: .. code-block:: bash asreview simulate MY_DATASET.csv -o MY_SIMULATION.asreview This performs a simulation with the default active learning model, where ``MY_DATASET.csv`` is the path to the :ref:`lab/data_labeled:Fully labeled data` you want to simulate. The result of the simulation is stored, after a successful simulation, at ``MY_SIMULATION.asreview`` where ``MY_SIMULATION`` is the filename you prefer and the extension is ``.asreview`` (ASReview project file extension). Simulation progress ------------------- The progress of the simulation is given with two progress bars. The top one is used to count the number of relevant records found. The bottom one monitors the number of records labeled. By default (see ``--n-stop``), the simulation stops once the the top progress bar reaches 100%. .. code-block:: bash Relevant records found: 100%|█████████████████████████████████████████████████████████| 38/38 [00:04<00:00, 7.83it/s] Records labeled : 7%|███▊ | 322/4544 [00:04<01:03, 66.37it/s] Loss: 0.021 NDCG: 0.530 Command line arguments for simulating ------------------------------------- The command ``asreview simulate --help`` provides an overview of available arguments for the simulation. Each of the sections below describe the available arguments. The example below shows how you can set the command line arguments. .. code-block:: bash asreview simulate MY_DATASET.csv -o MY_SIMULATION.asreview -q max_random Dataset ~~~~~~~ .. option:: dataset Required. File path or URL to the dataset or one of the SYNERGY datasets. You can also use one of the :ref:`SYNERGY dataset `. Use the following command and replace ``DATASET_ID`` by the dataset ID. .. code:: bash asreview simulate synergy:DATASET_ID For example: .. code:: bash asreview simulate synergy:van_de_schoot_2018 -o myreview.asreview Active learning ~~~~~~~~~~~~~~~ .. option:: --ai AI The AI to simulate with. Default is :code:`elas_u4`. .. option:: -c, --classifier CLASSIFIER The classifier for active learning. Default is Naive Bayes (:code:`nb`). .. option:: -q, --querier QUERIER The querier for active learning. Default is Maximum (:code:`max`). .. option:: -b, --balancer BALANCER Data rebalancing strategy mainly for RNN methods. Helps against imbalanced datasets with few inclusions and many exclusions. Default is :code:`balanced`. .. option:: -e, --feature-extractor FEATURE_EXTRACTOR Feature extraction algorithm. Some combinations of feature extractors and classifiers are not supported or feasible. Default is TF-IDF (:code:`tfidf`). .. option:: --seed SEED Seed for the model (classifiers, balance strategies, feature extraction techniques, and query strategies). .. option:: --prior-seed PRIOR_SEED Seed for selecting prior records if the ``--prior-idx`` option is not used. If the option ``--prior-idx`` is used with one or more indices, this option is ignored. .. option:: --embedding EMBEDDING_FP File path of embedding matrix. Required for LSTM models. Prior knowledge ~~~~~~~~~~~~~~~ By default, the model initializes with no prior included or excluded records. You can set the number of priors by ``--n-prior-included`` and ``--n-prior-excluded``. Alternatively, you can initialize your model with a specific set of starting papers using ``--prior-idx`` or ``--prior-record-id`` to select the indices or record IDs of the papers you want to start the simulation with. The following options can be used to label prior knowledge: .. option:: --n-prior-included N_PRIOR_INCLUDED Sample n prior included records. Only used when ``--prior-idx`` is not given. Default 0. .. option:: --n-prior-excluded N_PRIOR_EXCLUDED Sample n prior excluded records. Only used when ``--prior-idx`` is not given. Default 0. .. option:: --prior-idx [PRIOR_IDX [PRIOR_IDX ...]] Prior indices by row number (row numbers start at 0). .. option:: --prior-record-id [PRIOR_RECORD_ID [PRIOR_RECORD_ID ...]] Prior indices by record ID. Simulation setup ~~~~~~~~~~~~~~~~ .. option:: --n-query N_QUERY Number of records queried each query. Default 1. .. option:: --n-stop N_STOP The number of label actions to simulate. If not set, simulation stops after the last relevant record is found. Use -1 to simulate all label actions. .. option:: --config-file CONFIG_FILE Configuration file for the learning cycle. Results ~~~~~~~ .. option:: --output OUTPUT, -o OUTPUT Location to ASReview project file of simulation. .. option:: --verbose VERBOSE, -v VERBOSE Verbosity level. Algorithms ---------- The command line interface provides an easy way to get an overview of all available active learning model elements (classifiers, query strategies, balance strategies, and feature extraction algorithms) and their names for command line usage in ASReview LAB. The following command lists the available models: .. code:: bash asreview algorithms The command includes models added via :doc:`../technical/extensions`. See :doc:`../technical/extensions` for more information on developing new models and install them via extensions. Use :code:`pip install asreview-dory` to get access to all Dory models. The Dory extension contains a collection of New and Exciting MOdels.