Pre-Screening

Before you can actually start screening you have to initialize a project, select a dataset, prior knowledge and a model.

Start Setup

After you have started a project, you are redirected to the project dashboard and you will first be asked to initialize the setup.

  1. Open ASReview LAB.

  2. Start a new project.

  3. Click the Start Setup button.

ASReview project setup

Note: some of the features available in the project dashboard are described in the Post-Screening section.

Select Dataset

To select a dataset:

  1. Open ASReview LAB.

  2. Start a new project.

  3. Click the Start Setup button.

  4. Choose one of the four options to select a dataset and click upload:

ASReview dataset selector

Warning

If you upload your own data, make sure to remove duplicates and to retrieve as many abstracts as possible (don’t know how?). With clean data you benefit most from what active learning has to offer.

From File

Upload your file by Drag ‘n’ Drop, or select your file via the browser. The data needs to adhere to a specific format. If a file is uploaded and recognized as one of the available formats, it will display the message Successful upload and state the number of records in the dataset.

From URL

Fill in a link to a file on the Internet. For example, a link from this dataset repository.

From Extension

Select a file available via an extension like the COVID-19 extension.

Benchmark Datasets

Select one of the benchmark datasets.

Partly Labeled Data

If you want to include decisions you’ve already made prior to setting up your project, you can upload a partly labeled dataset containg labels for part of the data and unlabeled records you want to screen with ASReview. This might be helpful if you switch from screening in another tool to screening with ASReview, or when updating an existing systematic review with more recent publications.

Currently, this can be done by merging your dataset with labeled and unlabeled records via Excel or another reference manager. Your dataset should contain a column, called label_included (or: final_included, label, label_included, included_label, included_final, included, included_flag, include) which is filled with 1’s or 0’s for the publications that you have already screened, and is empty for the records that you still need to screen using ASReview.

To use a partly labeled dataset:

  1. Open ASReview LAB.

  2. Start a new project.

  3. Click the Start Setup button.

  4. Select your partly labeled dataset.

ASReview will recognize the column with the labels and show you the number of prior relevant/irrelevant papers in the section Prior Knowledge.

Select Prior Knowledge

The first iteration of the active learning cycle requires prior knowledge to work. This knowledge is used to train the first model. In this step you need to provide at least one relevant and one irrelevant document. To facilitate this, it is possible to search within your dataset (for finding prior relevant papers) or ask the software to present a couple of random documents (for prior irrelevant papers).

  1. Open ASReview LAB.

  2. Start a new project.

  3. Click the Start Setup button.

  4. Select a dataset.

  5. Click Search or Random to select your prior knowledge.

ASReview prior knowledge selector

After selecting some prior information, you can click Next.

ASReview prior knowledge selector next

Random

You also need to provide at least one prior irrelevant document. One way to find an irrelevant document is by labeling a set of random records from the dataset. Given that the majority of documents in the dataset are irrelevant (extremely imbalanced data problem), the documents presented here are likely to be irrelevant for your study. Click on random to show a few random documents. Indicate for each document whether it is relevant or irrelevant.

ASReview prior knowledge random

After labeling a couple of randomly selected documents, ASReview LAB will ask you whether you want to stop. Click on STOP and click Next.

Select Model

It is possible to change the settings of the Active learning model. There are three ingredients that can be changed in the software: the type of classifier, the query strategy and the feature extraction technique.

To change the default setting:

  1. Open ASReview LAB.

  2. Start a new project, upload a dataset and select prior knowledge.

  3. Click on the edit icon (top right).

  4. Using the drop-down menu select a different classifier, query strategy or feature extraction technique.

  5. Click Finish.

ASReview model

The classifier is the machine learning model used to compute the relevance scores. The available classifiers are Naive Bayes, Support Vector Machine, Logistic Regression, and Random Forest. More classifiers can be selected via the API. The default is Naive Bayes, though relatively simplistic, it seems to work quite well on a wide range of datasets.

The query strategy determines which document is shown after the model has computed the relevance scores. The three options are: certainty-based, mixed and random. When certainty-based is selected, the documents are shown in the order of relevance score. The document most likely to be relevant is shown first. When mixed is selected, the next document will be selected certainty-based 95% of the time, and randomly chosen otherwise. When random is selected, documents are shown in a random order (ignoring the model output completely). Warning: selecting this option means your review is not going to be accelerated by using ASReview.

The feature extraction technique determines the method how text is translated into a vector that can be used by the classifier. The default is TF-IDF (Term Frequency-Inverse Document Frequency) from SKLearn. It works well in combination with Naive Bayes and other fast training models. Another option is Doc2Vec provided by the gensim package which needs to be installed manually. To use it, install the gensim package manually:

pip install gensim

It takes relatively long to create a feature matrix with this method. However, this only has to be done once per simulation/review. The upside of this method is the dimension-reduction that generally takes place, which makes the modelling quicker.