ASReview LAB: Active learning for Systematic Reviews

Welcome to the ASReview LAB Documentation!

Get Started

What is ASReview LAB?

ASReview LAB is a free (Libre) open-source machine learning tool for screening and systematically labeling a large collection of textual data. It’s sometimes referred to as a tool for title and abstract screening in systematic reviews or meta-analyses, but it can handle any type of textual data that must be screened systematically, see the paper published in Nature Machine Intelligence.

ASReview LAB implements three different options:

  • Oracle: Screen textual data in interaction with the active learning model. The reviewer is the ‘oracle’, making the labeling decisions.

  • Simulation: Evaluate the performance of active learning models on fully labeled data.

  • Validation: Validate labels provided by another screener or derived from an LLM or AI, and explore benchmark datasets without being an oracle.

ASReview LAB is one of the products of the ASReview research project initiated at Utrecht University, which has grown into a vivid community of researchers, users, and developers from around the world.

What is active learning?

Artificial Intelligence (AI) and machine learning has allowed the development of AI-aided pipelines that assist in finding relevant texts for search tasks. A well-established approach to increasing the efficiency of screening large amounts of textual data is screening prioritization through Active Learning: a constant interaction between a human who labels records and a machine learning model which selects the most likely relevant record based on a minimum training dataset. The active learning cycle is repeated until the annotator is sufficiently confident they have seen all relevant records. Thus, the machine learning model is responsible for ranking the records and the human provides the labels, this is called Researcher-In-The-Loop (RITL).

It allows the screening of large amounts of text in an intelligent and time-efficient manner. ASReview LAB, published in Nature Machine Intelligence, has shown the benefits of active learning, reducing up to 95% of the required screening time.

Labeling workflow with ASReview

Start and finish a systematic labeling process with ASReview LAB by following these steps:

  1. Create a dataset with potentially relevant records you want to screen systematically. Improve the quality of the data and specify clear reviewing (inclusion/exclusion) criteria

  2. Specify a stopping criterion

  3. Start ASReview LAB

  4. Create a project

  5. Import your dataset

  6. Select Prior Knowledge

  7. Select the four components of the Active learning model (feature extractor, classifier, balancing method, query strategy)

  8. Wait until the warm up of the AI is ready (the software is extracting the features and trains the classifier on the prior knowledge)

  9. Start Screening until you reach your stopping criterion

  10. At any time, you can export the dataset the labeling decisions or the entire project.

Quick start

  1. Check if Python 3.8 or later is installed (if not, install Python)

python --version
  1. Install ASReview LAB

pip install asreview
  1. Open ASReview LAB

asreview lab
  1. Click Create to create a project

  2. Select a mode (Oracle, Validation, Simulation)

  3. Name the project, and if you want, add an author name(s) and type a description

  4. Import a dataset you want to review, or select a benchmark dataset (only available for the Validation and Simulation mode)

  5. Add prior knowledge. Select at least 1 relevant and 1 irrelevant record to warm up the AI. You can search for a specific record or request random records

  6. Select the four components of the active learning model, or rely on the default settings that have shown fast and excellent performance in many simulation studies

  7. ASReview LAB starts extracting the features and runs the classifier with the prior knowledge

You’re ready to start labeling your data! All your labeling actions are automatically saved, so there is no need to click the save button (we don’t even have one).

ASReview LAB terminology

When you do text screening for a systematic review in ASReview LAB, it can be useful to know some basic concepts about systematic reviewing and machine learning to understand. The following overview describes some terms you might encounter as you use ASReview LAB.

Active learning model

An active learning model is the combination of four elements: a feature extraction technique, a classifier, a balance, and a query strategy.

ASReview

ASReview stands for Active learning for Systematic Reviews or AI-assisted Systematic Reviews, depending on context. Avoid this explanation, only use as tagline.

ASReview CLI

ASReview CLI is the command line interface that is developed for advanced options or for running simulation studies.

Data

Data includes dataset, prior knowledge, labels, and notes.

Dataset

A dataset is the collection of records that the user imports and exports.

ELAS

ELAS stands for “Electronic Learning Assistant”. It is the name of ASReview mascot. It is used for storytelling and to increase explainability.

Export

Export is the action of exporting a dataset or a project from ASReview LAB.

Extension

An extension is the additional element to the ASReview LAB, such as the ASReview Datatools extension.

Import

Import is the action of importing a dataset or a project into ASReview LAB.

Model configuration

Model configuration is the action of the user to configure the active learning model.

Note

A note is the information added by the user in the note field and stored in the project file. It can be edited on the History page.

Project

A project is a project created in ASReview LAB.

Projects dashboard

The project dashboard is the landing page containing an overview of all projects in ASReview LAB.

Project file

The project file is the .asreview file containing the data and model configuration. The file is exported from ASReview LAB and can be imported back.

Project mode

The project mode includes oracle, simulation, and validation in ASReview LAB:

Oracle mode is used when a user reviews a dataset systematically with interactive artificial intelligence (AI).

Validation mode is used when a user validates existing labels or engages in a review process without being an oracle

Simulation mode is used when a user simulates a review on a completely labeled dataset to see the performance of ASReview LAB.

Status

The project status is the stage that a project is at in ASReview LAB.

Setup refers to the fact that the user adds project information, imports the dataset, selects the prior knowledge, configures the model and initiates the first iteration of model training.

In Review refers to the fact that in oracle or validation mode, the user adds labels to records, or in simulation mode, the simulation is running.

Finished refers to the fact that in oracle or validation mode, the user decides to complete the reviewing process or has labeled all the records, or in simulation mode, the simulation has been completed.

Published refers to the fact that the user publishes the dataset and project file in a repository, preferably with a Digital Object Identifier (DOI).

Record

A record is the data point that needs to be labeled. A record can contain both information that is used for training the active learning model, and information that is not used for this purpose.

In the case of systematic reviewing, a record is meta-data for a scientific publication. Here, the information that is used for training purposes is the text in the title and abstract of the publication. The information that is not used for training typically consists of other metadata, for example, the authors, journal, or DOI of the publication.

Reviewing

Reviewing is the decision-making process on the relevance of records (“irrelevant” or “relevant”). It is interchangeable with Labeling, Screening, and Classifying.

User

The human annotator is the person who labels records.

Screener

Replacement term when the context is PRISMA-based reviewing.

Key principles

The use of ASReview LAB comes with five fundamental principles:

  1. Humans are the oracle;

  2. Code is open & results are transparent;

  3. Decisions are unbiased;

  4. The interface shows an AI is at work;

  5. Users are responsible for importing high quality data.

Privacy

The ASReview LAB software doesn’t collect any information about the usage or its user. Great, isn’t it!

Research

The open source ASReview LAB software is one of the products of the ASReview research project. The ASReview research project is a fundamental and applied research project studying the application of AI in the field of systematically reviewing large amounts of text data.

Note

The ASReview project is developed by researchers for researchers, and anyone is welcome to join the community!

There are still 1001 scientific papers that can be published using the ASReview products. We welcome researchers worldwide to work on papers like applying the existing models to new types of datasets (different scientific fields, other languages, multilanguage data, text data outside academia, large datasets, etcetera), adding new models and testing the performance on the available benchmark datasets, adding and testing new stopping rules or performance metrics, and so on!

Scientific principles

The research team works according to the Open Science principles and invests in an inclusive community contributing to the project. In short, research is conducted according to the following fundamental principles:

  • Research output should be FAIR (Findable Accessible Interoperable and Reusable).

  • Research should be conducted with integrity, and we commit ourselves to the Netherlands Code of Conduct for Research Integrity.

  • Output should be rewarded according to the Declaration on Research Assessment (DORA).

Utrecht University has established specific regulations governing conduct for its employees. These are based on the key principles of professional and quality academic conduct and ethically-responsible research. Members of the team employed by Utrecht University, commit themselves to these regulations in all their conduct, including all work related to ASReview. Adherence to similar key principles is expected of all researchers involved in all facets of the ASReview project.

Cite

For scientific use, we encourage users to cite:

For citing the documentation (or to download the pdf) go to Zenodo.

For citing the software ASReview LAB, refer to the specific release of the software, available on the export screen.

Cite ASReview LAB

Support

Questions can be asked on GitHub Discussions. For bug reports and feature requests, please submit an issue on GitHub.

Collaborate

If you are interested in (scientific) collaboration, contact Prof. Dr. Rens van de Schoot or send an email to asreview@uu.nl.

Contribute

How do you go from user to contributor? There are many ways to join in, and it might be less complicated than you expect. In a blogpost, we list some easy examples for first-time contributors, for example sharing your experiences or answering user questions on the Discussion platform.

Specific instructions for code-contributing are available on Github as well as instructions for developers.

Note

All contributions, small or large, are very much appreciated!

Questions can be asked on GitHub Discussions. For bug reports and feature requests, please submit an issue on GitHub.

Installation

Install ASReview

ASReview software requires an installation of Python 3.8 or later. Detailed step-by-step instructions to install Python (and ASReview) are available for Windows and macOS/Linux users.

Install the ASReview software with Pip by running the following command in the CMD.exe (Windows) or Terminal (MacOS/Linux):

pip install asreview

Start the application with the following command (in CMD.exe or Terminal):

asreview lab

The ASReview LAB software starts in the browser. For more options on starting ASReview LAB, see Start ASReview LAB.

Note

See Troubleshooting for common problems during installation.

Tip

For users with Apple M1 computers, if you experience problems, follow the instructions.

Upgrade ASReview

Upgrade ASReview software with

pip install --upgrade asreview

Uninstall ASReview

Remove ASReview with

pip uninstall asreview

Enter y to confirm.

Warning

Note that your project files will not delete with this action. You find them in the .asreview folder in your home folder.

Server Installation

It is possible to run the ASReview software on a server or custom domain. Use the flags ip and port for configuration. ASReview should only be used in closed networks.

asreview lab --port 5555 --ip xxx.x.x.xx

Warning

Don’t use the development server in production. Read the Flask documentation about deploying a Flask app to production.

Install with Docker

ASReview is also available as a Docker container. Make sure you have Docker installed on your machine.

To install and start ASReview LAB at http://localhost:5000, run the following:

docker run -p 5000:5000 ghcr.io/asreview/asreview:latest lab

More advanced command line options can be given afterward, like this:

docker run -p 9000:9000 ghcr.io/asreview/asreview lab --port 9000

Tip

ASReview LAB is now installed. Open the URL in your host web browser: http://localhost:5000 and get started.

Mount local volume

To mount the container to your local project folder (or any other local folder), the -v flag can be used. To do so, adjust path-to-your-folder to your local folder. When a project folder is specified, ASReview LAB will store and load all its projects from this folder. Note that multiple containers can access the same folder.

docker run -p 5000:5000 -v path-to-your-folder:/project_folder ghcr.io/asreview/asreview lab

Named container

To make the usage easier, you can create a named container like the following:

docker create --name asreview-lab -p 5000:5000 -v path-to-your-folder:/project_folder ghcr.io/asreview/asreview lab

To start asreview, enter:

docker start asreview

To stop it, just use stop instead of start. You can also check which images are running with docker ps.

Customize the image

If you want to add more extensions, or build the Docker image yourself, check the file Dockerfile <https://github.com/ghcr.io/asreview/asreview/tree/master/Dockerfiles>. Modify it as you see fit, and then build and run the image with:

docker build -t asreview/asreview:custom .
docker run -p 5000:5000 ghcr.io/asreview/asreview:custom lab

Start ASReview LAB

After you install ASReview LAB, start the program via the command line to start using it.

asreview lab

When you are using Windows, open CMD.exe and run the command. When you use MacOS or Linux, you can open Terminal and run the command.

The information in the sections below is more advanced and not needed for the majority of the ASReview LAB users.

Command line arguments for starting ASReview LAB

ASReview LAB provides a powerful command line interface for running ASReview LAB with other options or even run tasks like simulations. For a list of available commands in ASReview LAB, type asreview lab --help.

asreview lab launches the ASReview LAB software (the frontend).

asreview lab [options]
-h, --help

Show help message and exit.

--ip IP

The IP address the server will listen on.

--port PORT

The port the server will listen on.

--port-retries NUMBER_RETRIES

The number of additional ports to try if the specified port is not available.

--enable-auth ENABLE_AUTH

Enable authentication.

--secret-key SECRET_KEY

Secret key for authentication.

--salt SALT

When using authentication, a salt code is needed for hasing passwords.

--flask-configfile FLASK_CONFIGFILE

Full path to a JSON file containing Flask parameters for authentication.

--no-browser NO_BROWSER

Do not open ASReview LAB in a browser after startup.

--certfile CERTFILE_FULL_PATH

The full path to an SSL/TLS certificate file.

--keyfile KEYFILE_FULL_PATH

The full path to a private key file for usage with SSL/TLS.

--embedding EMBEDDING_FP

File path of embedding matrix. Required for LSTM models.

--clean-project CLEAN_PROJECT

Safe cleanup of temporary files in project.

--clean-all-projects CLEAN_ALL_PROJECTS

Safe cleanup of temporary files in all projects.

--seed SEED

Seed for the model (classifiers, balance strategies, feature extraction techniques, and query strategies). Use an integer between 0 and 2^32 - 1.

Set environment variables

The following environment variables are available.

ASREVIEW_PATH

The path to the folder with project. Default ~/.asreview.

How you set environment variables depends on the operating system and the environment in which you deploy ASReview LAB.

In MacOS or Linux operating systems, you can set environment variables from the command line. For example:

export ASREVIEW_PATH=~/.asreview

On Windows, you can use the following syntax:

set ASREVIEW_PATH=~/.asreview

To check if you set an environment variable successfully, run the following on *nix operating systems:

echo $ASREVIEW_PATH

Or the following on Windows operating systems:

echo %ASREVIEW_PATH%

Run ASReview LAB on localhost with a different port

By default, ASReview LAB runs on port 5000. If that port is already in use or if you want to specify a different port, start ASReview LAB with the following command:

asreview lab --port <port>

For example, start ASReview LAB on port 5001:

asreview lab --port 5001

Troubleshooting

ASReview LAB is advanced machine learning software. In some situations, you might run into unexpected behavior. See below for solutions to problems.

Unknown Command “pip”

The command line returns one of the following messages:

-bash: pip: No such file or directory
'pip' is not recognized as an internal or external command, operable program or batch file.

First, check if Python is installed by using the following command:

python --version

If this doesn’t return a version number, then Python is either not installed or not correctly installed

Most likely, the environment variables aren’t configured correctly. Follow the step-by-step installation instruction on the ASReview website (Windows and MacOS).

However, there is a simple way to deal with correct environment variables by adding python -m in front of the command. For example:

python -m pip install asreview

Unknown command “asreview”

In some situations, the entry point “asreview” can not be found after installation. First check whether the package is correctly installed. Do this with the command python -m asreview -h. If this shows a decryption of the program, use python -m in front of all your commands. For example:

python -m asreview lab

Build dependencies error

The command line returns the following message:

"Installing build dependencies ... error"

This error typically happens when the version of your Python installation has been released very recently. Because of this, the dependencies of ASReview are not compatible with your Python installation yet. It is advised to install the second most recent version of Python instead. Detailed step-by-step instructions to install Python (and ASReview) are available for Windows and MacOS users.

Remove temporary files

In case ASReview runs into unexpected errors or doesn’t work as expected, it is advised to try to remove temporary files from the project first. These files can be found in the .asreview/ folder in your home directory. However, the easiest way to remove these files is with:

asreview lab --clean-all-projects

This will safely remove temporary files, nothing will harm your review. To clean a specific project, use

asreview lab --clean-project my-project

in which my_project is your project name.

Prepare your data

ASReview LAB requires a dataset containing a set of textual records (e.g., titles and abstracts of scientific papers, newspaper articles, or policy reports) obtained via a systematic search. The goal is to review all records systematically using predetermined inclusion and exclusion criteria. Also, it should be expected that only a fraction of the records in the dataset is relevant.

Datasets can be unlabeled as well as Partially labeled data and Fully labeled data. See Project modes for more information.

The easiest way to obtain a dataset is via a search engine or with the help of a reference manager. See Compatibility for reference managers export formats supported by ASReview. For more information about the format of the dataset, see Data format.

High-quality data

The algorithms of ASReview LAB work best with high-quality datasets. A high-quality dataset is a dataset with duplicate records removed, and the data is complete. Complete data implies that titles and abstracts are available for all (or most) records. See the ASReview blog Importance of Abstracts for more ideas on composing a high-quality dataset.

Compatibility

Citation Managers

The following table provides an overview of export files from citation managers which are accepted by ASReview.

.ris

.csv

.xlsx

EndNote

N/A

N/A

Excel

N/A

Mendeley

N/A

N/A

Refworks

N/A

N/A

Zotero

N/A

  • ✅ = The data can be exported from the citation manager and imported in ASReview.

  • N/A = This format does not exist.

RIS files used for screening in ASReview LAB can be imported back into the reference software and the decision labels can be found in the notes field. For more information see this instruction video.

Note: the RIS-pipeline is extensively tested for reference managers Zotero and EndNote. However, it might also work for other reference managers but is currently not supported.

Note

When using EndNote use the following steps to export a RIS file (.ris):

  • In EndNote, click on the style selection dropdown menu from the main EndNote toolbar.

  • Click “Select Another Style”.

  • Browse to RefMan (RIS) Export and click “Choose”.

  • Click on the file menu and select “Export”.

  • Pick a name and location for the text file.

  • Choose the output format RefMan (RIS) Export and click “Save”.

Search Engines

When using search engines, it is often possible to store the articles of interest in a list or folder within the search engine itself. Thereafter, you can choose from different ways to export the list/folder. When you have the option to select parts of the citation to be exported, choose the option which will provide the most information.

The export files of the following search engines have been tested for their acceptance in ASReview:

.ris

.tsv

.csv

.xlsx

CINAHL (EBSCO)

N/A

X

N/A

Cochrane

N/A

N/A

Embase

N/A

Eric (Ovid)

✅*

N/A

N/A

N/A

Psychinfo (Ovid)

✅*

N/A

N/A

N/A

Pubmed

X

N/A

X

N/A

Scopus

N/A

N/A

Web of Science

N/A

N/A

N/A

  • ✅ = The data can be exported from the search engine and imported in ASReview.

  • N/A = This format does not exist.

  • X = Not supported, (see Data format for other options).

* Make sure to uncheck all inclusion options (e.g., “URL”) when exporting from Ovid.

Tip

If the export of your search engine is not accepted in ASReview, you can also try the following: import the search engine file first into one of the citation managers mentioned in the previous part, and export it again into a format that is accepted by ASReview.

Systematic Review Software

There are several software packages available for systematic reviewing, see https://www.nature.com/articles/s42256-020-00287-7. Some of them use machine learning, while other focus on screening and management. The overview below shows an overview of alternative software programs and the compatibility with ASReview.

.ris

.tsv

.csv

.xlsx

Abstrackr

N/A

N/A

Covidence*

N/A

N/A

Distiller

X

N/A

✅**

✅**

EPPI-reviewer

N/A

N/A

X

Rayyan

N/A

N/A

Robotreviewer

N/A

N/A

N/A

N/A

  • ✅ = The data can be exported from the third-party review software and imported in ASReview.

  • N/A = This format does not exist.

  • X = Not supported.

* When using Covidence it is possible to export articles in .ris format for different citation managers, such as EndNote, Mendeley, Refworks and Zotero. All of these are compatible with ASReview.

** When exporting from Distiller and if the following error occurs Unable to parse string "Yes (include)" at position 0 set the sort references by to Authors. Then the data can be imported in ASReview.

Data format

To carry out a systematic review with ASReview on your own dataset, your data file needs to adhere to a certain format. ASReview accepts the following formats:

Tabular file format

Tabular datasets with extensions .csv, .tab, .tsv, or .xlsx can be used in ASReview LAB. CSV and TAB files are preferably comma, semicolon, or tab-delimited. The preferred file encoding is UTF-8 or latin1.

For tabular data files, the software accepts a set of predetermined column names:

Table with column name definitions

Name

Column names

Mandatory

Title

title, primary_title

yes*

Abstract

abstract, abstract note

yes*

Keywords

keywords

no

Authors

authors, author names, first_authors

no

DOI

doi

no

URL

url

no

Included

final_included, label, label_included, included_label, included_final, included, included_flag, include

no

* Only a title or an abstract is mandatory.

Title, Abstract Each record (i.e., entry in the dataset) should hold metadata on a paper. Mandatory metadata are only title or abstract. If both title and abstract are available, the text is combined and used for training the model. If the column title is empty, the software will search for the next column primary_title and the same holds for abstract and abstract_note.

Keywords, Authors If keywords and/or author (or if the column is empty: author names or first_authors) are available it can be used for searching prior knowledge. Note the information is not shown during the screening phase and is also not used for training the model, but the information is available via the API.

DOI and URL If a Digital Object Identifier ( DOI) is available it will be displayed during the screening phase as a clickable hyperlink to the full text document. Similary, if a URL is provided, this is also displayed as a clickable link. Note by using ASReview you do not automatically have access to full-text and if you do not have access you might want to read this blog post.

Included A binary variable indicating the existing labeling decisions with 0 = irrelevant/excluded, or 1 = relevant/included. If no label is present, we assume the record is not seen by the reviewer. Different column names are allowed, see the table. The behavior of the labels is different for each mode, see Fully, partially, and unlabeled data.

RIS file format

RIS file formats (with extensions .ris or .txt) are used by digital libraries, like IEEE Xplore, Scopus and ScienceDirect. Citation managers Mendeley, RefWorks, Zotero, and EndNote support the RIS file format as well. See (wikipedia) for detailed information about the format.

For parsing RIS file format, ASReview LAB uses a Python RIS files parser and reader (rispy). Successful import/export depends on a proper data set structure. The complete list of accepted fields and default mapping can be found on the rispy GitHub page.

The labels ASReview_relevant, ASReview_irrelevant, and ASReview_not_seen are stored with the N1 (Notes) tag, and can be re-imported into ASReview LAB. The behavior of the labels is different for each mode, see Fully, partially, and unlabeled data.

Tip

The labels ASReview_relevant, ASReview_irrelevant, and ASReview_not_seen are stored with the N1 (Notes) tag. In citation managers Zotero and Endnote the labels can be used for making selections; see the screenshots or watch the instruction video.

Note

When re-importing a partly labeled dataset in the RIS file format, the labels stored in the N1 field are used as prior knowledge. When a completely labeled dataset is re-imported it can be used in the Exploration and Simulation mode.

Example record with a labeling decision imported to Zotero

Example record with a labeling decision imported to Zotero

Example record with a labeling decision imported to Endnote

Example record with a labeling decision imported to Endnote

Fully, partially, and unlabeled data

Fully and partially labeled datasets serve a special role in the ASReview context. These datasets have review decisions for a subset of the records or for all records in the dataset.

Label format

For tabular datasets (e.g., CSV, XLSX), the dataset should contain a column called “included” or “label” (See Data format for all naming conventions), which is filled with 1’s or 0’s for the records that are already screened. The value is left empty for the records that you haven’t screened yet, or which are added to the dataset in case of updating a review. For the RIS file format, the labels ASReview_relevant, ASReview_irrelevant, and ASReview_not_seen) can be stored with the N1(Notes) tag.

Exported files containing labeling decisions can be re-imported into ASReview LAB whereafter all labels are recognized and its behavior is different for each mode:

  • In Oracle mode existing labels are used for prior knowledge.

  • In Validation mode records are presented along with an indication of their previous labeling status: relevant, irrelevant, or not seen. This status is displayed via a color-coded bar above each record.

  • In Simulation the column containing the labels is used to simulate a systematic review.

Unlabeled data

Unlabeled datasets do not contain any labels and can be used in the Oracle mode to start a review from scratch. Prior knowledge has to be selected in the Prior Knowledge step of the project set-up.

Partially labeled data

Partially labeled datasets are datasets with a labeling decision for a subset of the records in the dataset and no decision for another subset.

In Oracle mode, if labels are available for a part of the dataset, the labels will be automatically detected and used for Prior Knowledge. The first iteration of the model will then be based on these decisions and used to predict relevance scores for the unlabeled part of the data. It is usefull when a large number of records is needed for training, or when updating a systematic review, or to continue the screening process with model switching.

In Validation mode, the labels available are presented in the review screen along with an indication of their previous labeling status: relevant, irrelevant, or not seen. This status is displayed via a color-coded bar above each record, and you have the opportunity to refine the dataset by correcting any potential misclassifications, useful for the quality evaluation(see, for example, the SAFE procedure).

Note

Merging labeled with unlabeled data should be done outside ASReview LAB, for example, with the compose function of ASReview Datatools, or via Citation Managers.

Fully labeled data

Fully labeled datasets are datasets with a labeling decision for all records in the dataset.

In Simulation mode, the labels are used for mimicking the review proces for a Simulation study. Only records containing labels are used for the simulation, unlabeled records are ignored.

In Validation mode, the labels available in a fully labeled dataset are presented in the review screen along with an indication of their previous labeling status: relevant or irrelevant. It is usefull to validate labels as a human when the labels are predicted by a large language model (LLM), like by ChatGPT. Also, one can use this mode for teaching purporses.

Benchmark datasets

The ASReview research project collects fully labeled datasets published open access. The labeled datasets are PRISMA-based systematic reviews or meta-analyses on various research topics. They can be useful for teaching purposes or for testing the performance of (new) active learning models. The datasets and their metadata are available via the SYNERGY Dataset repository. In ASReview LAB, these datasets are found under “Benchmark Datasets”; only available for the Validation and Simulation modi.

The Benchmark Datasets are directly available in the software. During the Add dataset step of the project setup, there is a panel with all the datasets. The datasets can be selected and used directly. Benchmark datasets are also available via the Simulation via command line. Use the prefix synergy: followed by the identifier of the dataset (see Synergy Dataset repository). For example, to use the Van de Schoot et al. (2018) dataset, use synergy:van_de_schoot_2018.

Create a project

To start reviewing a dataset with ASReview LAB, you first need to create a project. The project will contain your dataset, settings, labeling decisions, and machine learning models. You can choose from three different project types: Oracle, Validation, and Simulation. The project setup consists of 4 steps: Project information, Data, Model, and Warm up. The sections below explain each of the steps of the setup.

To create a project:

  1. Start ASReview LAB.

  2. Go to the Projects dashboard if you are not already there (http://localhost:5000/projects)

  3. Click on the Create button on the bottom left

Project information

In Step 1, you provide all relevant information about your project as well as the type of project you want (the mode). The sections below provide more information on the input fields. After you complete this step, click next.

Project modes

In this step, you have to select a mode. The default is Oracle. For a description of all modi, see Fully, partially, and unlabeled data. In short, if you want to:

  • screen a dataset from scratch -> Oracle mode with unlabeled data;

  • continue screening, for example using a different model -> Oracle mode with partly labeled data;

  • validate labels provided by a another screener or predicted by a Large Language Model (e.g., ChatGPT) -> Validation mode with partly or fully labeled data;

  • learn how the software with active learning works -> Validation mode with fully labeled data;

  • mimic the screening process in a simulation study -> Simulation mode with fully labeled data.

Project modes

Project details

Provide project details like name of the project (required), author(s) (for example, the name of the screener), and a description. You can edit these values later in the Details page.

Data and Prior Knowledge

In Step 2, you import a dataset and select prior knowledge.

Add dataset

Click on Add to select a dataset. The data needs to adhere to a specific format. Keep in mind that in Oracle mode, your dataset is unlabeled or Partially labeled data; in Validation mode Partially labeled data or fully labeled; and for Simulation mode, you need Fully labeled data.

Tip

You will benefit most from what active learning has to offer with High-quality data.

Depending on the Project mode, you are offered different options for adding a dataset:

From File

Drag and drop your file or select your file. Click on Save on the top right.

Note

After adding your dataset, ASReview LAB shows the approximate number of duplicates. This number is based on duplicate titles and abstracts and if available, on the Digital Object Identifier (DOI). Removing duplicates can be done via ASReview Datatools, which also allows using a persistent identifier (PID) other than DOI for identifying and removing duplicates.

From URL or DOI

Insert a URL to a dataset. For example, use a URL from this dataset repository. It is also possible to provide a DOI to a data repository (supported for many data repositories via Datahugger). In a DOI points to multiple files, select the file you want to use (e.g. 10.17605/OSF.IO/WDZH5).

Click on Add to add the dataset.

From Extension

Select a file available via an extension (Oracle and Validation only). Click on Save on the top right.

Benchmark Datasets

Select one of the Benchmark datasets (Simulation and Validation only). Click on Save on the top right.

Prior Knowledge

The first iteration of the active learning cycle requires training data, referred to as prior knowledge. This knowledge is used by the classifier to create an initial ranking of the unseen records. In this step, you need to provide a minimum training data set of size two, with at least one relevant and one irrelevant labeled record.

Note

If you use Partially labeled data in the Oracle mode, you can skip this step, because the labels available in the dataset are used for training the first iteration of the model.

To facilitate prior selection, it is possible to search within your dataset, or . This is especially useful for finding records that are relevant based on previous studies or expert consensus.

You can also let ASReview LAB present you with random records. This can be useful for finding irrelevant records.

The interface works as follows; on the left, you will see methods to find records to use as prior knowledge, on the right, you will see your selected prior knowledge. If you have at least one relevant and one irrelevant record, you can click Close and go to the next step.

ASReview prior knowledge selector
Random

Warning

Do not use the random option to search for the sparse relevant records!

You also need to provide at least one prior irrelevant document. One way to find an irrelevant document is by labeling a set of random records from the dataset. Given that the majority of records in the dataset are irrelevant (extremely imbalanced data problem), the records presented here are likely to be irrelevant for your study. Click on random to show a few random records. Indicate for each record you want to use as training data whether it is irrelevant (or relevant).

ASReview prior knowledge random

In the Validation mode when selecting random records, one can choose random records from the subset of initially labeled relevant, irrelevant or not seen records. The initial labels are displayed via a color-coded bar above each record.

ASReview prior knowledge selector

Model

In the next step of the setup, you can select the active learning model. The default settings (Naïve Bayes, TF-IDF, Max) have fast and excellent performance. Most users can skip this step and click Next. More information about the active learning process can be found in the blog post Active learning explained,

Select model

It is possible to change the settings of the Active learning model. There are four settings that can be changed in the software:

Feature extraction

The feature extraction technique determines the method how text is translated into a vector that can be used by the classifier. The default is TF-IDF (Term Frequency-Inverse Document Frequency) from SKLearn. It works well in combination with Naive Bayes and other fast training models.

Another recommended option is Doc2Vec provided by the gensim package. Before starting ASReview LAB, first, install gensim:

pip install asreview[gensim]

Note

It takes relatively long to create a feature matrix with Doc2Vec, but this only has to be done once. The upside of this method is that it takes context into account. Also, a benefit is the dimension-reduction that generally takes place, which makes the modeling quicker.

Several other feature extractors are available in the software (sentence Bert, embedding IDF/LSTM) and more classifiers can be selected via the API, or added via an Model Extensions.

Classifier

The classifier is the machine learning model used to compute the relevance scores. The default is Naive Bayes. Though relatively simplistic, it seems to work quite well on a wide range of datasets. Several other classifiers are available in the software (logistic regression, random forest, SVM, LSTM, neural net) and more classifiers can be selected via the API or added via an Model Extensions.

The neural nets require tensorflow, use

pip install asreview[tensorflow]
Balancing Strategy

To decrease the class imbalance in the training data, the default is to rebalance the training set by a technique called dynamic resampling (DR) (Ferdinands et al., 2020). DR undersamples the number of irrelevant records in the training data, whereas the number of relevant records are oversampled such that the size of the training data remains the same. The ratio between relevant and irrelevant records in the rebalanced training data is not fixed, but dynamically updated and depends on the number of records in the available training data, the total number of records in the dataset, and the ratio between relevant and irrelevant records in the available training data. No balancing or undersampling are the other options. Other strategies can be selected via the API or added via an Model Extensions.

Query Strategy

The query strategy determines which document is shown after the model has computed the relevance scores. The options are: maximum (certainty-based), uncertainty, random, and clustering. When certainty-based is selected, the documents are shown in the order of relevance score. The document most likely to be relevant is shown first. When mixed is selected, the next document will be selected certainty-based 95% of the time, and uncertainty based or randomly chosen otherwise. When random is selected, documents are shown in a random order (ignoring the model output completely). Other strategies can be selected via the API or added via an Model Extensions.

Warning

Selecting random means your review will not be accelerated by using ASReview.

Model switching

During the screening phase, it is not possible to change the model. However, it is possible to select a first model, screen part of the data, and export the dataset with the labeling decisions of the first model. This partly-labeled dataset can be imported into a new project and the labels based on the first model will be recognized as prior knowledge. Then, a second model can be trained on the partly-labeled data, and the new predictions will be based on the second model.

Tip

It is suggested to screen with a simple active learning model (e.g., the defaults) first until you reach your stopping criteria, then switch to a different model (e.g., doc2vec plus a neural net) and screen again until you reach your stopping criteria.

Warm up

In the last step of the setup, step 4, ASReview LAB runs the feature extractor and trains a model, and ranks the records in your dataset. Depending on the model and the size of your dataset, this can take a couple of minutes (or even longer; you can enjoy the animation video). After the project is successfully initialized, you can start reviewing.

Note

In Simulation mode, this step starts the simulation. As simulations usually take longer to complete, the simulation will run in the background. After a couple of seconds, you will see a message and a button “Got it”. You will navigate to the Analytics page, where you can follow the progress (see Refresh button on the top right)

ASReview LAB warmup

Screening

Note

Only for Oracle and Validation. Read more about the options for the Simulation mode in the Overview.

Introduction

As soon as your project is initiated, you can start reviewing. Click on Review in the left menu if your project is not on the review page yet. ASReview LAB presents you a title and abstract to screen and label.

You are asked to make a decision: relevant or irrelevant?

ASReview Screening

Screening in Oracle mode

In the Oracle mode, unlabeled records are presented to you. Depending on the selected strategy it is the most likely relevant record (default setting) or based on another:ref:project_create:Query Strategy.

Click on the decision of your choice, and a new record is presented to you. While you review the next record, a new model is being trained. ASReview LAB continuously improves its understanding of your decisions, constantly updating the underlying ordering of the records.

Each labeling decision of the user starts the training of a new model given no model is being trained at that time. When this new model is trained, the unseen records’ rank order is updated. Training and labeling occur asynchronously. With fast models, a new ranking will probably be available before the user finishes reading the text. With slower models, training continues until a new model is trained, and the user can continue screening the next record in line (2nd, 3rd, etc.). Therefore, the record shown to the user can be the one with the highest relevance score of the second last model or the highest-ranked as a result of the latest model until a new model is trained. For a detailed description of the the data model, see the paper Reproducibility and Data Storage Checklist.

As you keep reviewing documents and providing labels, you will probably see fewer relevant records. When to stop screening is left to you. See Progress and results for more information on progress monitoring and information on when to stop.

Tip

If you are in doubt about your decision, take your time as you are the oracle. Based on your input, a new model will be trained, and you do not want to confuse the prediction mode. For the model, it may be better to consult others, and read the full text (in case of reviewing abstracts of scientific papers)

Screening in Validation mode

The Validation mode (formerly known as Exploration mode) is tailored for scenarios where it’s necessary to validate existing labels or engage in a review process without being an oracle. This mode is especially beneficial for validating labels made by a first screener, reviewing labels predicted by Large Language Models(LLMs) such as ChatGPT, or for educational and training purposes.

In this mode, records are presented along with an indication of their previous labeling status: relevant, irrelevant, or not seen. This status is displayed via a color-coded bar above each record. If a record was labeled by another screener or an AI model, you have the opportunity to validate, or challenge these labels, helping to refine the dataset by correcting any potential misclassifications, useful for the quality evaluation of the SAFE procedure.

Additionally, the Validation mode is useful for educational use. Instructors and learners can utilize this mode to simulate the screening process without being the expert decision-maker. This setup is particularly advantageous in workshop settings, where participants can engage with the screening process using the labeled SYNERGY datasets. This hands-on experience offers valuable insights into the software’s functionality and the systematic review process without the need to be a content expert. For comprehensive online teaching materials and tutorials on using ASReview LAB effectively, please visit the ASReview Academy.

ASReview Screening in Validation Mode

Autosave

Your decisions (and notes) are saved automatically into your ASReview project file. There is no need to press any buttons to save your work anywhere in ASReview LAB (in fact, there is not even a save button).

Change decisions

In some cases, you might want to change your previous decision. The screening interface of ASReview LAB offers two options to change your decision.

Undo last decision

You can return to your previous decision during screening.

  1. Start ASReview LAB.

  2. Open or Create a project.

  3. Label the record displayed in the screen as relevant or irrelevant.

  4. Click on Undo (At the bottom right)

  5. Click on Keep (ir)relevant or Convert to (ir)relevant.

  6. Continue screening.

You can disable this option in the Settings menu.

Screening history

An overview of your decisions made during screening can be found on the History page. You can change decisions on this page.

  1. Start ASReview LAB.

  2. Open or Create a project.

  3. Click on History in the menu on the left.

Show project history

Changing decisions on the history page

  1. To change a label of a record, click the heart icon. The next iteration of the model will take the new label into account.

Full Text

If a column with Digital Object Identifiers (DOI) or URLs is available in the metadata of your dataset, ASReview LAB will display the DOI and URL during screening. Most of the time, DOIs point to the full-text of a publication. See datasets for more information on including DOI and URL values to your datasets.

Keyboard shortcuts

ASReview LAB supports the use of keyboard shortcuts during screening. The table below lists the available keyboard shortcuts.

You can press a key (or a combination of keys) to label a record as relevant or irrelevant, or to return to the previous decision during screening. By default, keyboard shortcuts are disabled.

Action

Shortcut

Label record as relevant

r or Shift + r

Label record as irrelevant

i or Shift + i

Return to previous decision

u or Shift + u

Note

Keyboard shortcuts are only available when the Undo feature has been enabled in the Settings (bottom left).

Display

Dark mode

ASReview LAB offers the option to customize the screening appearance and functionality.

  1. Start ASReview LAB.

  2. Click on Settings (bottom left).

  3. Go to Display and toggle the dark mode

Note

Your preference is saved in the browser.

Font size

You can make the text on the review screen smaller or larger.

  1. Start ASReview LAB.

  2. Click on Settings (bottom left).

  3. Go to Display and click on Font size.

  4. Slide the slider to the desired font size.

ELAS Memory Game

If you want a break from screening, you can search for the hidden ELAS memory game.

ELAS Memory Game

If you really need a long break, try the expert mode.

ELAS Memory Game - expert mode

Progress and results

During screening, you might want to keep track of your progress and to obtain information for your stopping criteria. This section provides documentation on useful tools for these purposes.

Analytics

ASReview LAB offers some insightful statistics, a progress chart, and recall chart to keep track of your screening process and help you to decide when to stop screening.

To open the statistics panel:

  1. Start ASReview LAB.

  2. Open or Create a project.

  3. Click on Analytics on in the left menu.

ASReview LAB progress analytics for fully labeled

This figure shows the Analytics page of a fully labeled dataset. All relevant records are found in the first part of the screening.

Summary statistics

The summary statistics are counts of the records in your dataset.

  • Total records: the total number of records in your dataset.

  • Labeled records: the number of records that you labeled as relevant or irrelevant, including those you added as prior knowledge.

  • Relevant records: the number of records that you labeled as relevant, including those you added as prior knowledge.

  • Irrelevant records: the number of records that you labeled as irrelevant, including those you added as prior knowledge.

  • Irrelevant since last relevant: the number of irrelevant records you have seen since the last relevant record.

Charts

The charts on the analytics page can be useful to monitor your progress. There is a Progress and a Recall chart. The charts do not include prior knowledge and are most relevant after you have screened at least 10 records.

Progress chart

The progress chart plots the number of relevant records in the last 10 records that you reviewed in ASReview LAB. For example, when you reviewed 100 records, you labeled 3 relevant records between the 91st and 100th reviewed records.

Recall chart

The recall chart plots the number of relevant records against the number of records that you reviewed in ASReview LAB. Relevant by ASReview LAB refers to the relevant records that you labeled with the assistance of the active learning model. Random relevant refers to the relevant records that you might find if you manually reviewed the records so far without the assistance of the active learning model.

Export Figure

The plots can be exported as a figure:

  1. Start ASReview LAB.

  2. Open or Create a project.

  3. Click on Analytics on in the left menu.

  4. Click on the hamburger menu next to the Progress or Recall chart.

  5. Select Download SVG or PNG to export a figure, or select Download CSV to export the data behind the figure.

Stop screening

The blogpost *ASReview Class 101* and the How to stop screening? discussion provide tips on when to stop with screening.

Tip

The number of irrelevant records since last relevant will increase the longer you screen.

Tip

With Maximum as Query Strategy, you will see a decline in the number of relevant items in the plots the longer you screen. This may help to decide when to ‘stop screening <https://github.com/asreview/asreview/discussions/557>`_.

Tip

The data behind the recall plot can be used to calculate the knee-algorithm as a stopping criteria.

Mark project as finished

When you decide to stop screening, you can mark the project as finished. You can undo this at any time. To mark your project as finished:

  1. Start ASReview LAB.

  2. Go to the Projects dashboard (http://localhost:5000/projects)

  3. Hover the project you want to mark as finished and click on Options.

  4. Click on Mark as finished.

The button to continue screening is now disabled. This can be undone by clicking again on Mark as in review.

Export results

You can export the results of your labeling to a RIS, CSV, TSV, or Excel file. A file contains all imported data including your decisions, or a file with a selection of the relevant records only.

The following variables will be added to your tabular dataset:

  • The column titled included contains the labels as provided by the user: 0 = not relevant, 1 = relevant and if missing it means the record is not seen during the screening process.

  • The column titled asreview_ranking contains an identifier to preserve the rank ordering as described below.

  • The column ASReview_prior contains a label 1 if a record has been used to train the first iteration of the model, a label 0 if not used for training, and empty when the record was not seen.

  • The column asreview_label_to_validate is added in the exploration mode and contains the labels initially present in the data.

  • The column Notes contain any notes you made during screening.

For RIS files, the labels ASReview_relevant, ASReview_irrelevant, ASReview_not_seen, and ASReview_prior, ASReview_validate_relevant/irrelevant/not_seen are stored with the N1 (Notes) tag. In citation managers like Zotero and Endnote, the labels can be used for making selections.

The file is ordered as follows:

  1. All relevant records you have seen in the order they were shown during the screening process.

  2. All records not seen during the screening and ordered from most to least relevant according to the last iteration of the model.

  3. All non-relevant records are presented in the order these are shown during the screening process.

To download your results follow these steps:

  1. Start ASReview LAB.

  2. Open or Create a project.

  3. Click on Export in the menu on the left.

  4. Select Dataset.

  5. Select the file type for prefer: i.e. Excel, RIS, TSV, or CSV file.

  6. Save the file to your device.

ASReview LAB dataset download

Note

A RIS file can only be exported if a RIS file is imported.

Love using ASReview? On the Export screen you can get inspired how you can give back to our open-source and community-driven project.

ASReview LAB Export screen

Manage projects

ASReview LAB offers the options to import and export projects. This can be useful for sharing results, archiving projects, and for backup purposes.

ASReview LAB Projects dashboard show options

Import Project

To import a project:

  1. Start ASReview LAB.

  2. Go to the Projects dashboard (http://localhost:5000/projects)

  3. Click on the Import project icon on the top right.

  4. Click on Select file and select a project from your device (with .asreview extension.

  5. Open the project from the Projects dashboard.

Export Project

The ASReview project file (extension .asreview) can be exported from ASReview LAB. The file contains the dataset, review history, notes, and model configuration. It can be imported into ASReview LAB on a different device, which allows other users to replicate the project, or continue the systematic review.

To export your project:

  1. Start ASReview LAB.

  2. Go to the Projects dashboard (http://localhost:5000/projects)

  3. Hover the project you want to export and click on the Export icon.

  4. Click on Select file and click on Project.

  5. Click on Export

You will be asked where to save the ASReview file (extension .asreview).

Delete Project

To permanently delete a project, including ALL files:

  1. Start ASReview LAB.

  2. Go to the Projects dashboard (http://localhost:5000/projects)

  3. Hover the project you want to export and click on Options.

  4. Click on Delete forever.

  5. This action cannot be made undone, ASReview LAB will ask you to confirm by typing in the project title.

Extensions

ASReview has extensive support for extensions. They can extend the functionality of ASReview LAB, and the Command Line Interface. There are officially supported extensions and community maintained extensions.

Looking to develop your own extension? See Create extensions for detailed instructions.

Installation

Most extensions are installable from PyPI (the same way ASReview LAB is installed) or GitHub. It is preferred to follow the installation instructions provided by the extension.

The following example shows the installation of ASReview Insights, an extension for plotting and computing metrics for simulations in ASReview.

pip install asreview-insights

Extension (only) published on Github can be installed directly from the repository. Replace {USER_NAME} and {REPO_NAME} by the corresponding values of the extension.

pip install git@github.com:{USER_NAME}/{REPO_NAME}.git

Supported Extensions

The following extensions are officially supported and maintained by the maintainers of ASReview LAB. They are extensively tested and integrate well with ASReview LAB.

  • ASReview Datatools
    • ASReview-datatools: Tool for describing, cleaning (input) data, and converting file formats via the command line.

  • ASReview Insights
    • ASReview-insights: Advanced insights to ASReview simulations like performance plots and metrics.

  • ASReview Wordcloud
  • ASReview Makita
    • ASReview-makita: ASReviews’ Makita (MAKe IT Automatic) is a workflow generator for simulation studies using the command line interface of ASReview LAB. Makita can be used to simplify your own research by enabling you to effortlessly generate the framework and code for your simulation study.

List of extensions for ASReview LAB

The List of extensions for ASReview LAB on the Discussion platform gives an overview of known extensions to ASReview LAB and other useful tools in the AI-aided systematic review pipeline. These extensions can extend the software with new models, subcommands, and datasets.

Overview

ASReview LAB offers three different solutions to run simulations with the:

What is a simulation?

A simulation involves mimicking the screening process with a certain model. As it is already known which records are labeled as relevant, the software can automatically reenact the screening process as if a human was labeling the records in interaction with the Active Learning model.

Why run a simulation?

Simulating with ASReview LAB has multiple purposes. First, the performance of one or multiple models can be measured by different metrics (see Analyzing results). A convenient one is that you can investigate the amount of work you could have saved by using active learning compared to your manual screening process.

Suppose you don’t know which model to choose for a new (unlabeled) dataset. In that case, you can experiment with the best performing combination of the classifier, feature extraction, query strategy, and balancing and test the performance on a labeled dataset with similar characteristics.

You could also use the simulation mode to benchmark your own model against existing models for different available datasets. ASReview LAB allows for adding new models via a template.

You can also find ‘odd’ relevant records in a ‘classical’ search. Such records are typically found isolated from most other records and might be worth closer inspection

Datasets for simulation

Simulations require fully labeled datasets (labels: 0 = irrelevant, 1 = relevant). Such a dataset can be the result of an earlier study. ASReview offers also fully labeled datasets via the SYNERGY dataset. These datasets are available via the user interface in the Data step of the setup and in the command line with the prefix synergy: (e.g. synergy:van_de_schoot_2018).

Tip

When you import your data, make sure to remove duplicates and to retrieve as many abstracts as possible (See Importance-of-abstracts blog for help). With clean data you benefit most from what active learning has to offer.

Cloud environments

For advanced scenarios, such as executing ASReview simulations in cloud environments or running them in parallel, consult the specialized cloud usage guide. This guide provides tailored instructions for a variety of use cases, including simulations on cloud platforms such as SURF, Digital Ocean, AWS, Azure, and leveraging Kubernetes for large-scale simulation tasks. More information can be found in the paper: Optimizing ASReview simulations: A generic multiprocessing solution for ‘light-data’ and ‘heavy-data’ users

Simulate via the webapp

To run a simulation in the ASReview webapp, create a project as described in Create a project. Most of the steps of the setup are identical or straightforward. In this section, some of the differences are highlighted.

In the step on Project Information, select the “Simulation” mode (see figure below).

ASReview LAB simulate option

In the step Data, import a fully labeled dataset or use one of the benchmark datasets.

Selecting prior knowledge is relatively easy. In case you know relevant records to start with, use the search function. In case you don’t, use the Random option. Toggle the button “Relevant” on top to see some random irrelevant records. Label some relevant and some irrelevant records.

ASReview LAB Prior selection for simulation study

The step Warm up is differs slightly from the Oracle and Validation mode. This step starts the simulation, after some seconds, it will return “Got it”. This means, the simulation runs further in the background. You are returned to the Analytics page.

ASReview LAB simulation runs in background

This page now has a refresh button on the top right. If the simulation is not finished yet, you can refresh the page or use the refresh button to follow the progress. After a while, the Elas mascot on the left will hold a sign with “finished”. Your simulation is now finished and you can study the results in the analytics page.

Simulation via command line

ASReview LAB comes with a command line interface for simulating the performance of ASReview algorithm.

Getting started

The simulation command line tool can be accessed directly like:

asreview simulate MY_DATASET.csv -s MY_SIMULATION.asreview

This performs a simulation with the default active learning model, where MY_DATASET.csv is the path to the Fully labeled data you want to simulate. The result of the simulation is stored, after a successful simulation, at MY_SIMULATION.asreview where MY_SIMULATION is the filename you prefer and the extension is .asreview (ASReview project file extension).

Simulation progress

The progress of the simulation is given with two progress bars. The top one is used to count the number of relevant records found. The bottom one monitors the number of records labeled. By default (with --stop-if min), the simulation stops once the the top progress bar reaches 100%.

Simulation started

Relevant records found: 100%|███████████████████████████████████████████████| 43/43 [00:03<00:00, 13.42it/s]
Records labeled       :   7%|██▉                                        | 420/6189 [00:03<00:43, 133.58it/s]

Simulation finished

Command line arguments for simulating

The command asreview simulate --help provides an overview of available arguments for the simulation.

Each of the sections below describe the available arguments. The example below shows how you can set the command line arguments. This can be helpful if you are new to the using the command line. For example, you want to change the query strategy being used. The command line and this documentation show -q, --query_strategy QUERY_STRATEGY. The default is max. If you want to change it to max_random, you use:

asreview simulate MY_DATASET.csv -s MY_SIMULATION.asreview -q max_random

Dataset

dataset

Required. File path or URL to the dataset or one of the SYNERGY datasets.

You can also use one of the SYNERGY dataset. Use the following command and replace DATASET_ID by the dataset ID.

asreview simulate synergy:DATASET_ID

For example:

asreview simulate synergy:van_de_schoot_2018 -s myreview.asreview

Active learning

-e, --feature_extraction FEATURE_EXTRACTION

The default is TF-IDF (tfidf). More options and details are listed in asreview.models.feature_extraction.

-m, --model MODEL

The default is Naive Bayes (nb). More options and details are listed in asreview.models.classifiers.

-q, --query_strategy QUERY_STRATEGY

The default is Maximum (max). More options and details are listed in asreview.models.query.

-b, --balance_strategy BALANCE_STRATEGY

The default is double. The balancing strategy is used to deal with the sparsity of relevant records. More options and details are listed in asreview.models.balance

--seed SEED

To make your simulations reproducible you can use the --seed and --init_seed options. ‘init_seed’ controls the starting set of papers to train the model on, while the ‘seed’ controls the seed of the random number generation that is used after initialization.

--embedding EMBEDDING_FP

File path of embedding matrix. Required for LSTM models.

Prior knowledge

By default, the model initializes with one relevant and one irrelevant record. You can set the number of priors by --n_prior_included and --n_prior_excluded. However, if you want to initialize your model with a specific set of starting papers, you can use --prior_idx to select the indices of the papers you want to start the simulation with. When no prior knowledge is assigned (using --n_prior_included 0 --n_prior_excluded 0), the first records from the dataset are employed as priors in the order they were provided until the first 0 and 1 are encountered.

The following options can be used to label prior knowledge:

--n_prior_included N_PRIOR_INCLUDED

The number of prior included papers. Only used when prior_idx is not given. Default 1.

--n_prior_excluded N_PRIOR_EXCLUDED

The number of prior excluded papers. Only used when prior_idx is not given. Default 1.

--prior_idx [PRIOR_IDX [PRIOR_IDX ...]]

Prior indices by rownumber (rownumbers start at 0).

--init_seed INIT_SEED

Seed for setting the prior indices if the prior_idx option is not used. If the option prior_idx is used with one or more index, this option is ignored.

Simulation setup

--n_instances N_INSTANCES

Controls the number of records to be labeled before the model is retrained. Increase n_instances, for example, to reduce the time it takes to simulate. Default 1.

--stop_if STOP_IF

The number of label actions to simulate. Default, ‘min’ will stop simulating when all relevant records are found. Use -1 to simulate all labels actions.

Save

--state_file STATE_FILE, -s STATE_FILE

Location to ASReview project file of simulation.

Algorithms

The command line interface provides an easy way to get an overview of all available active learning model elements (classifiers, query strategies, balance strategies, and feature extraction algorithms) and their names for command line usage in ASReview LAB. It also includes models added via Extensions. The following command lists the available models:

asreview algorithms

See Create extensions for more information on developing new models and install them via extensions.

Some models require additional dependencies to be installed. Use pip install asreview[all] to install all additional dependencies at once or check the installation instruction in section Models of the API Reference.

Analyzing results

After a simulation, the results are stored in the ASReview project file (extension .asreview). This file contains a large number of variables and logs on the simulation. The data can be extracted from the project file via the API or with one of the available extensions. See these examples on the Project API for more information about opening the project file.

One readily available extension for analyzing the results of a simulation is ASReview Insights. This extension offers valuable tools for plotting the recall and extracting the statistical results of several performance metrics, such as the Work Saved over Sampling (WSS), the proportion of Relevant Record Found (RRF), the Extra Relevant records Found (ERF), and the Average Time to Discover (ATD).

Install ASReview Insights directly from PyPi:

pip install asreview-insights

Detailed documentation on the extension can be found on the ASReview Insights project page.

Simulate with Python API

The API is still under development and can change at any time without warning.

For more control over the workings of the ASReview software, the ASReview Python API can be used. For example, it is possible to use custom models or implement different sampling strategies. This example shows how to simulate a review with the ASReview API and store the results in an ASReview project file.

Please keep in mind that the ASReview API is experimental at the moment. Improvements and simplifications are planned.

[1]:
from pathlib import Path

from asreview import ASReviewData, ASReviewProject
from asreview.review import ReviewSimulate

Create a temporary folder for the results and examples in this document.

[2]:
project_path = Path("tmp_data")
project_path.mkdir(exist_ok=True)

Create an ASReviewProject to store the results

[3]:
# Create a project object and folder
project = ASReviewProject.create(
    project_path=project_path / "api_simulation",
    project_id="api_example",
    project_mode="simulate",
    project_name="api_example",
)

Add a dataset to the project folder in the folder data (can also be stored somewhere else, but it is advised to use the data folder). In the following example, a dataset is downloaded from the benchmark platform with CURL (macOS, Unix systems).

[4]:
%%bash
curl https://raw.githubusercontent.com/asreview/systematic-review-datasets/metadata-v1-final/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv > tmp_data/api_simulation/data/van_de_Schoot_2017.csv
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9.9M  100  9.9M    0     0  13.2M      0 --:--:-- --:--:-- --:--:-- 13.2M

Add the reference to the dataset to the project.

[5]:
project.add_dataset("van_de_Schoot_2017.csv")

Setup the models.

[6]:
from asreview.models.classifiers import NaiveBayesClassifier
from asreview.models.query import MaxQuery
from asreview.models.balance import DoubleBalance
from asreview.models.feature_extraction import Tfidf

# Select models to use
train_model = NaiveBayesClassifier()
query_model = MaxQuery()
balance_model = DoubleBalance()
feature_model = Tfidf()

Run the simulation with the ReviewSimulate class.

[7]:
data_obj = ASReviewData.from_file(
    Path("tmp_data", "api_simulation", "data", "van_de_Schoot_2017.csv")
)
[8]:
# Initialize the simulation reviewer
reviewer = ReviewSimulate(
    as_data=data_obj,
    model=train_model,
    query_model=query_model,
    balance_model=balance_model,
    feature_model=feature_model,
    n_instances=10,
    project=project,
    n_prior_included=1,
    n_prior_excluded=1,
)
[9]:
# Start the review process
project.update_review(status="review")
try:
    reviewer.review()
    project.mark_review_finished()
except Exception as err:
    project.update_review(status="error")
    raise err

Export the project to a location of choice, in this case tmp_data/api_example.asreview.

[10]:
# Finish and export the project
project.export(Path("tmp_data", "api_example.asreview"))

The following code removes the temporary folder that was created:

[11]:
import shutil

shutil.rmtree(project_path)

ASReview LAB Server

ASReview LAB Server is a self-hosted, secure version of ASReview LAB. It is designed for facilitate users who want to use ASReview LAB but without the need to install it on their own computer. The web application that can be accessed from any device with a web browser and can be used on desktops, laptops, tablets, and mobile devices. ASReview LAB Server enables users to create an account or connect via their GitHub, ORCID, or Google accounts.

See the server configuration details for more information on how to configure your ASReview LAB on your server.

Features

ASReview LAB provides two options for creating an account: by connecting with your GitHub, ORCID, or Google account, or by creating an account. All information is stored securely on the ASReview LAB server and fully self-hosted.

Log in with GitHub, ORCID, or Google

ASReview LAB Server provides a easy way to log in with your GitHub, ORCID, or Google account.

Sign in with GitHub, ORCID, or Google account

See the server configuration details for more information on how to configure your ASReview on your server to enable this feature.

Create account

ASReview LAB Server provides a easy way to create an account with your email.

Create account with account and password

Installation

ASReview LAB server is installed in the same way as ASReview LAB. See the installation instructions for more information. See server configuration for more information on how to configure authentication on your ASReview LAB server.

Optional: If you want to make use of the PostgreSQL database, you need to install the psycopg2 package. This can be done by running the following command:

pip install psycopg2

Server configuration

ASReview LAB offers a number of options to run the application on a server. It is possible to run the application on a server with or without authentication. The latter is the default option. This page describes how to configure the ASReview LAB application to run on a server with authentication enabled. With authentication enabled, users can to run their projects in their own separate workspaces. Authentication requires the storage of user accounts and link these accounts to projects. Currently we are using a SQLite database (asreview.development.sqlite or asreview.production.sqlite) in the ASReview projects folder to store that information.

Bare bones authentication

The most basic configuration of the ASReview application with authentication is to run the application from the CLI with the --enable-auth flag. The application will start with authentication enabled and will create a SQLite database if it does not exist. The database will be stored in the ASReview projects folder. The database contains the user accounts and links them to projects.

Start the application with authentication enabled:

asreview lab --enable-auth --secret-key=<secret key> --salt=<salt>

where --enable-auth forces the application to run in an authenticated mode, <secret key> is a string that is used for encrypting cookies and <salt> is a string that is used to hash passwords. The --secret-key and --salt parameters are mandatory if authentication is required.

To create user accounts, one can use the add-users command of the auth-tool sub command of the ASReview application:

asreview auth-tool add-users --db-uri=sqlite:////path/example.sqlite

For more information about auth-tool and creating users, see the section Create user accounts below.

Full authentication configuration

To configure the authentication in more detail we need to create a TOML file that contains all relevant authentication parameters. The parameters in that TOML file will override parameters that were passed in the CLI. Below is an example of a TOML file (extension .toml) that enables authentication and OAuth with Github, Orcid and Google. It also enables email verification and allows users to create their own accounts. The email server is configured to confirm new accounts and to allow users to retrieve a new password if they forget it. The TOML file also contains the necessary parameters to run the application in a secure way (https).

DISABLE_LOGIN = false
SECRET_KEY = "<secret key>"
SECURITY_PASSWORD_SALT = "<salt>"
SESSION_COOKIE_SECURE = true
REMEMBER_COOKIE_SECURE = true
SESSION_COOKIE_SAMESITE = "Lax"
SQLALCHEMY_TRACK_MODIFICATIONS = true
ALLOW_ACCOUNT_CREATION = true
ALLOW_TEAMS = false
EMAIL_VERIFICATION = false

MAIL_SERVER = "<smtp-server>"
MAIL_PORT = 465
MAIL_USERNAME = "<smtp-server-username>"
MAIL_PASSWORD = "<smtp-server-password>"
MAIL_USE_TLS = false
MAIL_USE_SSL = true
MAIL_DEFAULT_SENDER = "<preferred reply email address>"

[OAUTH]
        [OAUTH.GitHub]
        AUTHORIZATION_URL = "https://github.com/login/oauth/authorize"
        TOKEN_URL = "https://github.com/login/oauth/access_token"
        CLIENT_ID = "<GitHub client ID>"
        CLIENT_SECRET = "<GitHub client secret>"
        SCOPE = ""

        [OAUTH.Orcid]
        AUTHORIZATION_URL = "https://sandbox.orcid.org/oauth/authorize"
        TOKEN_URL = "https://sandbox.orcid.org/oauth/token"
        CLIENT_ID = "<Orcid client ID>"
        CLIENT_SECRET = "<Orcid client secret>"
        SCOPE = "/authenticate"

        [OAUTH.Google]
        AUTHORIZATION_URL = "https://accounts.google.com/o/oauth2/auth"
        TOKEN_URL = "https://oauth2.googleapis.com/token"
        CLIENT_ID = "<Google client ID>"
        CLIENT_SECRET = "<Google client secret>"
        SCOPE = "profile email"

Store the TOML file on the server and start the ASReview application from the CLI with the --flask-configfile parameter:

asreview lab --flask-configfile=<path-to-TOML-config-file>

A number of the keys in the TOML file are standard Flask parameters. The keys that are specific for authenticating ASReview are summarized below:

  • DISABLE_LOGIN: if set to false the application will start with authentication. If the SQLite database does not exist, one will be created during startup.

  • SECRET_KEY: the secret key is a string that is used to encrypt cookies and is mandatory if authentication is required.

  • SECURITY_PASSWORD_SALT: another string used to hash passwords, also mandatory if authentication is required.

  • SESSION_COOKIE_SAMESITE: Restrict how cookies are sent with requests from external sites. In the example the value is set to “Lax” which is the recommended option. If backend and frontend are served on different domains set to the string “None”.

  • ALLOW_ACCOUNT_CREATION: enables account creation by users, either by front- or backend.

  • EMAIL_VERIFICATION: used in conjunction with ALLOW_ACCOUNT_CREATION. If set to true the system sends a verification email after account creation. Only relevant if the account is __not__ created by OAuth. This parameter can be omitted if you don’t want verification.

  • MAIL_<PAR>: configuration parameters to setup the SMTP email server that is used for email verification. It also allows users to retrieve a new password after forgetting it. Don’t forget to enter the reply address (MAIL_DEFAULT_SENDER) of your system emails. Remove these parameters if system emails for verification and password retrieval are unwanted.

  • OAUTH: an authenticated ASReview application may integrate with the OAuth functionality of Github, Orcid and Google. Provide the necessary OAuth login credentails (for Github, Orcid en Google). Please note that the AUTHORIZATION_URL and TOKEN_URL of the Orcid entry are sandbox-urls, and thus not to be used in production. Omit this parameter if OAuth is unwanted.

The SQLALCHEMY_DATABASE_URI key is not included in the TOML file. This key is used to configure the database connection. The default value is sqlite:///asreview.production.sqlite. This means that the application will use the SQLite database in the ASReview projects folder. If you would like to use a different database, you can add the SQLALCHEMY_DATABASE_URI key to the TOML file.

Set the SQLALCHEMY_DATABASE_URI environment variable to the path of the database. For example, to use the SQLite database in the ASReview projects folder:

FLASK_SQLALCHEMY_DATABASE_URI = "sqlite:///asreview.production.sqlite"

PostgreSQL database

You can replace the SQLite database with a PostgreSQL database. This requires an extra step during installation and an extra step in the configuration file:

  1. Install the psycopg2 package. At the time of this writing 2 versions of this package exist: psycopg2 and psycopg2-binary. According to the documentation the binary version works on most operating systems.

  2. Then add the SQLALCHEMY_DATABASE_URI key to the config file:

SQLALCHEMY_DATABASE_URI = "postgresql+psycopg2://username:password@host:port/database_name"

Create authentication database and tables with auth-tool

Server administrators can create a database for authentication with the auth-tool sub command of the ASReview application:

asreview auth-tool create-db --db-uri=sqlite:////path/example.sqlite

Please note that in this example, the –db-uri option is explicitly configured. However, it is not mandatory. If access to the authentication database is needed, the auth-tool utility first checks whether the –db-uri option has been provided. If not, it then examines the presence of the SQLALCHEMY_DATABASE_URI environment variable. In the absence of this variable as well, the script defaults to utilizing the database URI associated with the standard SQLite database pre-configured in the ASReview folder.

Create user accounts with auth-tool

Create user accounts interactively or by using a JSON string to bulk insert the accounts with add-users. To add user accounts interactively run the following command:

asreview auth-tool add-users --db-uri=sqlite:////path/example.sqlite

The tool will prompt you if you would like to add a user account. Type Y to continue and enter an email address, name, affiliation (not required) and a password for every person. Continue to add as many users as you would like.

If you would like to bulk insert user accounts use the --json option:

asreview auth-tool add-users \
        --db-uri=sqlite:////path/example.sqlite \
        -j "[{\"email\": \"name@email.org\", \"name\": \"Name of User\", \"affiliation\": \"Some Place\", \"password\": \"1234@ABcd\"}]"

The JSON string represents a Python list with a dictionary for every user account with the following keys: email, name, affiliation and password. Note that passwords require at least one symbol. These symbols, such as the exclamation mark, may compromise the integrity of the JSON string.

List projects with auth-tool

The auth-tool sub command of the ASReview application can be used to list projects.

Lists all projects with the list-projects command:

asreview auth-tool list-projects

List the projects in JSON format with the --json flag:

asreview auth-tool list-projects --json

The command returns a convenient JSON string that can be used to bulk insert and link projects into the database. The string represents a list containing a dictionary for every project.

List users with auth-tool

The auth-tool sub command of the ASReview application can be used to list users.

Lists all users with the list-users command:

asreview auth-tool list-users

Migrate projects from unauthenticated to authenticated

By default, the ASReview application runs in an unauthenticated mode. This means that all projects are stored in the same workspace. This is fine for a single user, but not for multiple users. If you would like to run the application in an authenticated mode, you need to convert the existing projects into authenticated ones with user identifiers assigned to each project. If you don’t do this, you won’t see any projects in the authenticated mode.

First, list all users with the list-users command. Create users if you don’t have users yet.

asreview auth-tool list-users --db-uri=sqlite:////path/example.sqlite

List all projects with the list-projects command. The command returns a

asreview auth-tool list-projects

Migrate the projects into the authenticated database can be done interactively:

asreview auth-tool link-projects --db-uri=sqlite:////path/example.sqlite

The tool will list project by project and asks what the ID of the owner is. That ID can be found in the user list below the project information.

You can also insert all project information by using the JSON string that was produced with the list-projects command. Add user identifiers to each project in the JSON string. For example, if the user ID of the owner is 15, the JSON string should look like this

asreview auth-tool link-projects \
        --db-uri=sqlite:////path/example.sqlite \
        --json "[{\"folder\": \"project-id\", \"version\": \"1.3\", \"project_id\": \"project-id\", \"name\": \"project 1\", \"authors\": \"Authors\", \"created\": \"2023-04-12 21:23:28.625859\", \"owner_id\": 15}]"

Overview

The development section is meant for users that need advanced functions of ASReview LAB and for developers. It contains technical information on the usage, instructions for developing extensions, and an extensive API reference.

ASReview architecture

ASReview provides users an API to interact directly with the underlying ASReview machinery. This provides researchers an interface to study the behavior of algorithms and develop custom workflows. The following overview shows the available interfaces for interacting with the ASReview software:

ASReview API
  • Layer 5: ASReview CLOUD

    • ASReview is currently in development. For information on ASReview CLOUD, be sure visit our communication channels.

  • Layer 4: ASReview LAB

    • ASReview LAB is the user friendly webapp and all underlying interfaces. Documentation on LAB can be found in the ASReview LAB section.

  • Layer 3: REST API

    • The REST API uses a Flask REST API to provide a method to let the React webapp communicate with the backend and algorithms. The REST API is not documented and should be considered ‘internal use only’.

  • Layer 2: Command Line

  • Layer 1: API Reference

    • The ASReview API is a low level Python interface for ASReview. This interface requires detailed knowledge about the workings of the software. This reference contains extensive documentation on all functions, classes, and modules found in ASReview.

    • An outline for usage can be found in Simulate with Python API and Access data from ASReview file.

Extensions

The Create an extension section documents the creation of model, subcommand, and dataset extensions for ASReview. More information on extensions can be found in the extension Extensions.

Command Line

ASReview provides a powerful command line interface for running tasks Start ASReview LAB and Simulation via command line. Also Extensions often make use of the command line interface by extending it with subcommands.

The structure of the command line is given by:

asreview [-h] [-V] [subcommand]

A list of available and installed subcommands is given by asreview -h. Each subcommand is listed with its subcommand, the package it comes from and the version of the package. For example, the default subcommand lab (to start ASReview LAB) is given by listed as lab [asreview-1.0]. An subcommand installed via an extension, e.g. plot, is listed as plot [asreview-insights-1.1] where asreview-insights is the name of the extension that installed this subcommand and 1.1 is the version of this package.

Access data from ASReview file

The API is still under development and can change at any time without warning.

Data generated using ASReview LAB is stored in an ASReview project file. Via the ASReview Python API, there are two ways to access the data in the ASReview (extension .asreview) file: Via the project-API and the state-API. The project API is for retrieving general project settings, the imported dataset, the feature matrix, etc. The state API retrieves data related directly to the reviewing process, such as the labels, the time of labeling, and the classifier used.

Example Data

To illustrate the ASReview Python API, the benchmark dataset van_de_Schoot_2017 is used. The project file example.asreview can be obtained by running asreview simulate benchmark:van_de_Schoot_2017 -s example.asreview --seed 101.

The ASReview Python API can be used for project files obtained via the Oracle, Validation, and Simulation mode.

Python Imports

[1]:
import shutil
from pathlib import Path

import pandas as pd
from asreview import open_state
from asreview import ASReviewProject
from asreview import ASReviewData

Project API

The ASReview project file is a zipped folder. To unzip the folder and store its contents in a temporary directory, use the following code:

[2]:
project_path = Path("tmp_data")
project_path.mkdir()
project = ASReviewProject.load("example.asreview", project_path)

The returned project instance is of type ASReviewProject.

To inspect the project details, use the following code:

[3]:
project.config
[3]:
{'version': '1.2+6.g41c4257.dirty',
 'id': 'example',
 'mode': 'simulate',
 'name': 'example',
 'description': 'Simulation created via ASReview via command line interface',
 'authors': None,
 'created_at_unix': 1683798551,
 'datetimeCreated': '2023-05-11 11:49:11.327073',
 'reviews': [{'id': 'e611d2cbd89b401aa376fa4eca1c517e',
   'start_time': '2023-05-11 11:49:12.323797',
   'status': 'finished',
   'end_time': '2023-05-11 11:49:32.450593'}],
 'feature_matrices': [{'id': 'tfidf', 'filename': 'tfidf_feature_matrix.npz'}],
 'dataset_path': 'van_de_Schoot_2017.csv'}

The imported dataset is located at /tmp_data/{project_name}/data/{dataset_filename}, and can be inspected using the following code:

[4]:
dataset_fp = Path(
    project_path, project.config["id"], "data", project.config["dataset_path"]
)
dataset = ASReviewData.from_file(dataset_fp)
print(f"The dataset contains {len(dataset)} records.")
dataset.to_dataframe().head()
The dataset contains 6189 records.
[4]:
title abstract keywords authors year date doi label_included label_abstract_screening duplicate_record_id
record_id
0 Manual for ASEBA School-Age Forms & Profiles Achenbach, T. M., Rescorla, L. A. 2001.0 2001 NaN 0 0 NaN
1 Queensland Trauma Registry: A summary of paedi... Dallow, N., Lang, J., Bellamy, N. 2007.0 2007 NaN 0 0 NaN
2 Posttraumatic Stress Disorder: Scientific and ... This comprehensive overview of research and cl... Ford, J. D., Grasso, D. J., Elhai, J. D., Cour... 2015.0 NaN NaN 0 0 NaN
3 SOCIAL CLASS AND MENTAL ILLNESS Hollingshead, A. B., Redlich, F. C. 1958.0 NaN NaN 0 0 NaN
4 Computerised test generation for cross-nationa... “‘Computerised Test Generation for Cross-Natio... Irvine, S. H. 2014.0 NaN NaN 0 0 NaN

To obtain the content of the feature matrix, for example, the first row of the matrix, use the following code (note the matrix is in a sparse matrix format):

[5]:
feature_extraction_id = project.feature_matrices[0]["id"]
feature_matrix = project.get_feature_matrix(feature_extraction_id)
print(feature_matrix[0])
  (0, 20452)    0.35937211648312967
  (0, 18297)    0.26158369118434677
  (0, 13842)    0.3248271421716685
  (0, 9739)     0.38355660008860293
  (0, 3231)     0.7059309068495663
  (0, 2384)     0.22684547910949254

State API

The data stored during the review process can be accessed as a pandas DataFrame using the following code:

[6]:
with open_state("example.asreview") as state:
    df = state.get_dataset()
    print(f"The state contains {len(df)} records.")
The state contains 561 records.

The returned state instance is of type SQLiteState. Note that the state contains less records than the original dataset. This is because by default the simulation stops after finding all relevant records.

[7]:
df.to_csv(project_path / "example_state.csv", index=False)
df.head()
[7]:
record_id label classifier query_strategy balance_strategy feature_extraction training_set labeling_time notes
0 4435 1 None prior None None -1 2023-05-11 11:49:16.186034 None
1 5560 0 None prior None None -1 2023-05-11 11:49:16.186034 None
2 4434 1 nb max double tfidf 2 2023-05-11 11:49:16.420695 None
3 3668 0 nb max double tfidf 3 2023-05-11 11:49:16.444989 None
4 3142 0 nb max double tfidf 4 2023-05-11 11:49:16.505603 None

You can merge the information from the state file with the original dataset.

[8]:
df["labeling_order"] = df.index
dataset_with_results = dataset.df.join(df.set_index("record_id"))
dataset_with_results.to_csv(project_path / "data_and_state_merged.csv", index=False)
dataset_with_results
[8]:
title abstract keywords authors year date doi label_included label_abstract_screening duplicate_record_id label classifier query_strategy balance_strategy feature_extraction training_set labeling_time notes labeling_order
record_id
0 Manual for ASEBA School-Age Forms & Profiles Achenbach, T. M., Rescorla, L. A. 2001.0 2001 NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Queensland Trauma Registry: A summary of paedi... Dallow, N., Lang, J., Bellamy, N. 2007.0 2007 NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Posttraumatic Stress Disorder: Scientific and ... This comprehensive overview of research and cl... Ford, J. D., Grasso, D. J., Elhai, J. D., Cour... 2015.0 NaN NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 SOCIAL CLASS AND MENTAL ILLNESS Hollingshead, A. B., Redlich, F. C. 1958.0 NaN NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Computerised test generation for cross-nationa... “‘Computerised Test Generation for Cross-Natio... Irvine, S. H. 2014.0 NaN NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6184 Biological and clinical framework for posttrau... Three decades of posttraumatic stress disorder... Vermetten, E., Lanius, R. A. 2012.0 NaN NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6185 Dividing traffic sub-areas based on a parallel... In order to alleviate the traffic congestion a... GPS trajectories, K-means, MapReduce, Traffic ... Wang, B., Tao, L., Gao, C., Xia, D., Rong, Z.,... 2014.0 NaN NaN 0 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6186 Quantifying resilience to enhance individualiz... Resilience is the human ability to adapt in th... Adaptation, Autonomic Nervous System, Resilien... Winslow, B., Carroll, M., Jones, D., Hannigan,... 2013.0 NaN NaN 0 0 NaN 0.0 nb max double tfidf 535.0 2023-05-11 11:49:31.593247 None 535.0
6187 A discriminant analysis of variables related t... Frye, James S. 1981.0 NaN NaN 0 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6188 Developmental trajectories of pain/disability ... Sterling, M., Hendrikz, J., Kenardy, J. 2010.0 NaN NaN 0 1 NaN 0.0 nb max double tfidf 333.0 2023-05-11 11:49:25.857883 None 333.0

6189 rows × 19 columns

There are also multiple functions to obtain one specific variable in the data. For example, to plot the labeling times in a graph, use the following code:

[9]:
with open_state("example.asreview") as state:
    labeling_times = state.get_labeling_times()
pd.to_datetime(labeling_times).plot(title="Time of labeling")
[9]:
<Axes: title={'center': 'Time of labeling'}>
_images/example_api_asreview_file_19_1.png

By default, the records that are part of the prior knowledge are included in the results. To obtain the labels use the following code:

[10]:
with open_state("example.asreview") as state:
    labels = state.get_labels(priors=False)
labels
[10]:
0      1
1      0
2      0
3      0
4      0
      ..
554    0
555    0
556    0
557    0
558    1
Name: label, Length: 559, dtype: int64

To obtain the data corresponding to a specific record identifier, use the following code:

[11]:
with open_state("example.asreview") as state:
    record_data = state.get_data_by_record_id(5176)
record_data
[11]:
record_id label classifier query_strategy balance_strategy feature_extraction training_set labeling_time notes
0 5176 0 nb max double tfidf 29 2023-05-11 11:49:17.247842 None

To obtain all settings used for the project, run the following code:

[12]:
with open_state("example.asreview") as state:
    settings = state.settings_metadata
settings
[12]:
{'settings': {'model': 'nb',
  'query_strategy': 'max',
  'balance_strategy': 'double',
  'feature_extraction': 'tfidf',
  'n_instances': 1,
  'stop_if': 'min',
  'n_prior_included': 1,
  'n_prior_excluded': 1,
  'model_param': {'alpha': 3.822},
  'query_param': {},
  'feature_param': {'ngram_max': 1,
   'stop_words': 'english',
   'split_ta': 0,
   'use_keywords': 0},
  'balance_param': {'a': 2.155, 'alpha': 0.94, 'b': 0.789, 'beta': 1.0}},
 'state_version': '1',
 'software_version': '1.2+6.g41c4257.dirty',
 'model_has_trained': True}

The state also contains the ranking and the relevance score (if the model uses relevance scores) of the last iteration of the machine learning model. To get these, use the following code:

[13]:
with open_state("example.asreview") as state:
    last_ranking = state.get_last_ranking()
    last_probabilities = state.get_last_probabilities()
print("RANKING:")
print(last_ranking[["record_id", "ranking"]])
print("RELEVANCE SCORES:")
print(last_probabilities)
RANKING:
      record_id  ranking
0          2445        0
1          2446        1
2          2444        2
3           720        3
4           719        4
...         ...      ...
6184       1766     6184
6185         63     6185
6186       4427     6186
6187       2851     6187
6188       4888     6188

[6189 rows x 2 columns]
RELEVANCE SCORES:
0       0.637417
1       0.671088
2       0.707728
3       0.777025
4       0.672183
          ...
6184    0.792209
6185    0.697030
6186    0.828880
6187    0.768638
6188    0.844882
Name: proba, Length: 6189, dtype: float64

Cleanup

The following code removes the temporary folder that was created:

[14]:
shutil.rmtree(project_path)

Create extensions

ASReview extensions enable you to integrate your programs with the ASReview framework seamlessly, by using the Python API. These extensions fall into three different categories, and interact with the API in different ways.

  1. Model extensions

  2. Subcommand extensions

  3. Dataset extensions

The extensibility of the framework is provided by the entrypoints of setuptools. You will need to create a package and install it (for example with pip).

Did you develop a useful extension to ASReview and want to list it on the Discussion platform? Create a Pull Request or open an issue on GitHub.

For more information on the ASReview API for creating an extension, a technical reference for development is found under the API reference. This technical reference contains functions for use in your extension, and an overview of all classes to extend on.

Model Extensions

An extension of a asreview.models.base.BaseModel type class.

Model extensions extent the ASReview software with new classifiers, query strategies, balance strategies, or feature extraction techniques. These extensions extend one of the model base classes (asreview.models.balance.base, asreview.models.classifiers.base, asreview.models.feature_extraction.base, asreview.models.query.base).

The easiest way to extend ASReview with a model is by using the template for extending ASReview. Create a copy of the template and add the new algorithm to a new model file. It is advised to use the following structure of the package:

├── README.md
├── asreviewcontrib
│   └── models
│       ├── classifiers
│          ├── __init__.py
│          └── example_model.py
│       ├── feature_extraction
│          ├── __init__.py
│          └── example_feature_extraction.py
│       ├── balance
│          ├── __init__.py
│          └── example_balance_strategies.py
│       └── query
│           ├── __init__.py
│           └── example_query_strategies.py
├── setup.py
└── tests

The next step is to add metadata to the setup.py file. Edit the name of the package and point the entry_points to the models.

entry_points={
    'asreview.models.classifiers': [
        'example = asreviewcontrib.models.classifiers.example_model:ExampleClassifier',
    ],
    'asreview.models.feature_extraction': [
        # define feature_extraction algorithms
    ],
    'asreview.models.balance': [
        # define balance_strategy algorithms
    ],
    'asreview.models.query': [
        # define query_strategy algorithms
    ]
},

This code registers the model with name example.

Subcommand Extensions

An extension of the asreview.entry_points.base.BaseEntryPoint class.

Subcommand extensions are programs that create a new entry point for ASReview. From this entry point the Python API can be used in many ways (like plot or simulate).

Extensions in ASReview are Python packages and can extend the subcommands of asreview (see asreview -h). An example of a subcommand extension is ASReview Insights.

The easiest way to create a new subcommand is by defining a class that can be used as a new entry point for ASReview. This class should inherit from asreview.entry_points.base.BaseEntryPoint. Add the functionality to the class method execute.

from asreview.entry_points import BaseEntryPoint

class ExampleEntryPoint(BaseEntryPoint):

    description = "Description of example extension"
    extension_name = "asreview-example"  # Name of the extension
    version = "1.0"  # Version of the extension in x.y(.z) format.

    def execute(self, argv):
        pass  # Implement your functionality here.

It is strongly recommended to define the attributes description, extension_name, and version.

The class method execute accepts a positional argument (argv in this example). First create the functionality you would like to be able to use in any directory. The argument argv are the command line arguments left after removing asreview and the entry point.

It is advised to place the newly defined class ExampleEntryPoints in the following package structure: asreviewcontrib.{extension_name}.{your_modules}. For example:

├── README.md
├── asreviewcontrib
│   └── example
│       ├── __init__.py
│       ├── entrypoint.py
│       └── example_utils.py
├── setup.py
└── tests

Create a setup.py in the root of the package, and set the keyword argument entry_points of setup() under asreview.entry_points, for example:

entry_points={
    "asreview.entry_points": [
        "example = asreviewcontrib.example.entrypoint:ExampleEntryPoint",
    ]
}

After installing this package, ASReview is extended with the asreview example subcommand. See asreview -h for this option.

Dataset Extensions

An extension of the asreview.datasets.BaseDataSet class.

Dataset extensions integrate new datasets for use in ASReview. Adding datasets via extension provides quick access to the dataset via Command Line Interface or in ASReview LAB.

It is advised to place the new dataset your_dataset in the following package structure:

├── README.md
├── asreviewcontrib
│   └── dataset_name
│       ├── __init__.py
│       └── your_dataset.py
├── data
│   └── your_dataset.csv
├── setup.py
└── tests

For minimal functionality, your_dataset.py should extent asreview.datasets.BaseDataSet and asreview.datasets.BaseDataGroup.

A working template to clone and use can be found at Template for extending ASReview with a new dataset.

Further functionality can be extensions of any other class in asreview.datasets.

API Reference

Data and datasets

Read data

load_data(name, **kwargs)

Load data from file, URL, or plugin.

ASReviewData([df, column_spec])

Data object to the dataset with texts, labels, DOIs etc.

Statistics

data.statistics.abstract_length(data)

Return the average length of the abstracts.

data.statistics.n_duplicates(data[, pid])

Number of duplicates.

data.statistics.n_irrelevant(data)

Return the number of irrelevant records.

data.statistics.n_keywords(data)

Return the number of keywords.

data.statistics.n_missing_abstract(data)

Return the number of records with missing abstracts.

data.statistics.n_missing_title(data)

Return the number of records with missing titles.

data.statistics.n_records(data)

Return the number of records.

data.statistics.n_relevant(data)

Return the number of relevant records.

data.statistics.n_unlabeled(data)

Return the number of unlabeled records.

data.statistics.title_length(data)

Return the average length of the titles.

Datasets

Available datasets

asreview.datasets.SynergyDataGroup()

Datasets available in the SYNERGY dataset.

asreview.datasets.NaturePublicationDataGroup()

Datasets used in the paper Van de Schoot et al. 2020.

Dataset managers

asreview.datasets.BaseDataSet(dataset_id[, ...])

asreview.datasets.BaseDataGroup(*datasets)

asreview.datasets.DatasetManager()

Reviewer

review.BaseReview(as_data, project[, model, ...])

Base class for Systematic Review.

review.ReviewSimulate(as_data, *args[, ...])

ASReview Simulation mode class.

Models

This section provides an overview of the available models for active learning in ASReview. For command line usage, use the name (example) given behind the model description (or see the name property of the model). Some models require additional dependencies, see the model class for more information and instructions.

Base class

models.base.BaseModel()

Abstract class for any kind of model.

asreview.models.feature_extraction

Classes

feature_extraction.base.BaseFeatureExtraction([...])

Base class for feature extraction methods.

feature_extraction.Tfidf(*args[, ngram_max, ...])

TF-IDF feature extraction technique (tfidf).

feature_extraction.Doc2Vec(*args[, ...])

Doc2Vec feature extraction technique (doc2vec).

feature_extraction.EmbeddingIdf(*args[, ...])

Embedding IDF feature extraction technique (embedding-idf).

feature_extraction.EmbeddingLSTM(*args[, ...])

Embedding LSTM feature extraction technique (embedding-lstm).

feature_extraction.SBERT(*args[, ...])

Sentence BERT feature extraction technique (sbert).

Functions

feature_extraction.get_feature_model(name, *args)

Get an instance of a feature extraction model from a string.

feature_extraction.get_feature_class(name)

Get class of feature extraction from string.

feature_extraction.list_feature_extraction()

List available feature extraction method classes.

asreview.models.classifiers

Classes

classifiers.base.BaseTrainClassifier()

Base model, abstract class to be implemented by derived ones.

classifiers.NaiveBayesClassifier([alpha])

Naive Bayes classifier (nb).

classifiers.RandomForestClassifier([...])

Random forest classifier (rf).

classifiers.SVMClassifier([gamma, ...])

Support vector machine classifier (svm).

classifiers.LogisticClassifier([C, ...])

Logistic regression classifier (logistic).

classifiers.LSTMBaseClassifier([...])

LSTM-base classifier (lstm-base).

classifiers.LSTMPoolClassifier([...])

LSTM-pool classifier (lstm-pool).

classifiers.NN2LayerClassifier([...])

Fully connected neural network (2 hidden layers) classifier (nn-2-layer).

Functions

classifiers.get_classifier(name, *args[, ...])

Get an instance of a model from a string.

classifiers.get_classifier_class(name)

Get class of model from string.

classifiers.list_classifiers()

List available classifier classes.

asreview.models.query

Classes

query.base.BaseQueryStrategy()

Abstract class for query strategies.

query.base.ProbaQueryStrategy()

query.MaxQuery()

Maximum query strategy (max).

query.MixedQuery([strategy_1, strategy_2, ...])

Mixed query strategy.

query.MaxRandomQuery([mix_ratio, random_state])

Mixed (95% Maximum and 5% Random) query strategy (max_random).

query.MaxUncertaintyQuery([mix_ratio, ...])

Mixed (95% Maximum and 5% Uncertainty) query strategy (max_uncertainty).

query.UncertaintyQuery()

Uncertainty query strategy (uncertainty).

query.RandomQuery([random_state])

Random query strategy (random).

query.ClusterQuery([cluster_size, ...])

Clustering query strategy (cluster).

Functions

query.get_query_model(name, *args[, ...])

Get an instance of the query strategy.

query.get_query_class(name)

Get class of query strategy from its name.

query.list_query_strategies()

List available query strategy classes.

asreview.models.balance

Classes

balance.base.BaseBalance()

Abstract class for balance strategies.

balance.SimpleBalance()

No balance strategy (simple).

balance.DoubleBalance([a, alpha, b, beta, ...])

Double balance strategy (double).

balance.TripleBalance([a, alpha, b, beta, ...])

Triple balance strategy (triple).

balance.UndersampleBalance([ratio, random_state])

Undersampling balance strategy (undersample).

Functions

balance.get_balance_model(name, *args[, ...])

Get an instance of a balance model from a string.

balance.get_balance_class(name)

Get class of balance model from string.

balance.list_balance_strategies()

List available balancing strategy classes.

Projects and States

Load, interact, and extract information from project files and states (the “diary” of the review).

ASReviewProject

ASReviewProject(project_path[, project_id])

Project class for ASReview project files.

State

open_state(asreview_obj[, review_id, read_only])

Initialize a state class instance from a project folder.

state.SQLiteState([read_only])

Class for storing the review state.

Utils

project.get_project_path(folder_id)

Get the project directory.

project.project_from_id(f)

Decorator function that takes a user account as parameter, the user account is used to get the correct sub folder in which the projects is

project.get_projects([project_paths])

Get the ASReview projects at the given paths.

project.is_project(project_path)

project.is_v0_project(project_path)

Check if a project file is of a ASReview version 0 project.

Readers and writers

This module contains the input and output functionality. You can install them as extensions.

asreview.list_readers()

List available dataset reader classes.

asreview.list_writers()

List available dataset writer classes.

io.CSVReader()

CVS file reader.

io.CSVWriter()

CSV file writer.

io.ExcelReader()

Excel file reader.

io.ExcelWriter()

Excel file writer.

io.PaperRecord(record_id[, column_spec])

A single record from a paper in a systematic review.

io.RISReader()

RIS file reader.

io.RISWriter()

RIS file writer.

io.TSVWriter()

TSV file writer.

Misc

Classes

asreview.settings.ASReviewSettings(model, ...)

Object to store the configuration of a review session.

Functions

search.fuzzy_find(as_data, keywords[, ...])

Find a record using keywords.

asreview_path()

Get the location where projects are stored.

get_data_home([data_home])

Return the path of the ASR data dir.

Entry points

Entry points for ASReview LAB.

entry_points.BaseEntryPoint()

Base class for defining entry points.

entry_points.AlgorithmsEntryPoint()

Entry point to list available algorithms in ASReview LAB.

entry_points.LABEntryPoint()

Entry point to start the ASReview LAB webapp.

entry_points.SimulateEntryPoint()

Entry point for simulation with ASReview LAB.

entry_points.StateInspectEntryPoint()

Entry point to inspect ASReview LAB review progress.

Indices and tables