Overview

What is a simulation?

A simulation involves mimicking the screening process with a certain model. As it is already known which records are labeled relevant, the software can automatically reenact the screening process as if a human was labeling the records.

Why run a simulation?

Simulating with ASReview LAB has multiple purposes. First, the performance of one or multiple models can be measured by different metrics (see Analyzing results). A convenient one is that you can investigate the amount of work you could have saved by using active learning compared to your manual screening process.

Suppose you don’t know which model to choose for a new (unlabeled) dataset. In that case, you can experiment with the best performing combination of the classifier, feature extraction, query strategy, and balancing and test the performance on a labeled dataset with similar characteristics.

You could also use the simulation mode to benchmark your own model against existing models for different available datasets. ASReview LAB allows for adding new models via a template.

You can also find ‘odd’ relevant records in a ‘classical’ search. Such records are typically found isolated from most other records and might be worth closer inspection

Datasets for simulation

Simulations require fully labeled datasets (labels: 0 = irrelevant, 1 = relevant). Such a dataset can be the result of an earlier study. ASReview offers also fully labeled datasets via the benchmark platform. These datasets are available via the user interface in the Data step of the setup and in the command line with the prefix benchmark: (e.g. benchmark:van_de_schoot_2017).

Warning

When you import your data, make sure to remove duplicates and to retrieve as many abstracts as possible (See Importance-of-abstracts blog for help). With clean data you benefit most from what active learning has to offer.

Simulating with ASReview LAB

ASReview LAB offers three different solutions to run simulations:

Simulate with webapp

To run a simulation in the ASReview webapp, create a project as described in Create a project. Most of the steps of the setup are identical or straightworward. In this section, some of the differences are highlighted.

In the step on Project Information, select the “Simulation” mode (see figure below).

ASReview LAB simulate option

In the step Data, import a fully labeled dataset or use one of the benchmark datasets.

ASReview LAB benchmark datasets

Selecting prior knowledge is relatively easy. In case you know relevant records to start with, use the search function. In case you don’t, use the Random option. Toggle the button “Relevant” on top to see some random irrelevant records. Label some relevant and some irrelevant records.

ASReview LAB benchmark datasets

The step Warm up is differs slightly from the Oracle and Exploration mode. This step start the simulation, after some seconds, it will return “Got it”. This means, the simulation runs further in the background. You are returned to the Analytics page.

ASReview LAB simulation runs in background

This page now has a refresh button on the top right. If the simulation is not finished yet, you can refresh the page or use the refresh button to follow the progress. After a while, the Elas mascotte on the left will hold a sign with “finished”. Your simulation is now finished and you can study the results in the analytics page.

Analyzing results

After a simulation, the results are stored in the ASReview project file (extension .asreview). This file contains a large number of variables and logs on the simulation. The data can be extracted from the project file via the API or with one of the available extensions. See these examples on the Project API for more information about opening the project file. An easier solution would be to use one of the extensions. ASReview Insights is a useful example.

The extension ASReview Insights offers useful tools, like plotting functions and metrics, to analyze results of a simulation.

Install ASReview Insights directly from PyPi:

pip install asreview-insights

Detailed documention can found on the ASReview Insights GitHub page.

The following command returns the recall at any moment during the simulation:

asreview plot recall MY_SIMULATION.asreview