Simulate with Python API

The API is still under development and can change at any time without warning.

For more control over the workings of the ASReview software, the ASReview Python API can be used. For example, it is possible to use custom models or implement different sampling strategies. This example shows how to simulate a review with the ASReview API and store the results in an ASReview project file.

Please keep in mind that the ASReview API is experimental at the moment. Improvements and simplifications are planned.

[1]:
from pathlib import Path

from asreview import ASReviewData, ASReviewProject
from asreview.review import ReviewSimulate

Create a temporary folder for the results and examples in this document.

[2]:
project_path = Path("tmp_data")
project_path.mkdir(exist_ok=True)

Create an ASReviewProject to store the results

[3]:
# Create a project object and folder
project = ASReviewProject.create(
    project_path=project_path / "api_simulation",
    project_id="api_example",
    project_mode="simulate",
    project_name="api_example",
)

Add a dataset to the project folder in the folder data (can also be stored somewhere else, but it is advised to use the data folder). In the following example, a dataset is downloaded from the benchmark platform with CURL (macOS, Unix systems).

[4]:
%%bash
curl https://raw.githubusercontent.com/asreview/systematic-review-datasets/metadata-v1-final/datasets/van_de_Schoot_2017/output/van_de_Schoot_2017.csv > tmp_data/api_simulation/data/van_de_Schoot_2017.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9.9M  100  9.9M    0     0  13.2M      0 --:--:-- --:--:-- --:--:-- 13.2M

Add the reference to the dataset to the project.

[5]:
project.add_dataset("van_de_Schoot_2017.csv")

Setup the models.

[6]:
from asreview.models.classifiers import NaiveBayesClassifier
from asreview.models.query import MaxQuery
from asreview.models.balance import DoubleBalance
from asreview.models.feature_extraction import Tfidf

# Select models to use
train_model = NaiveBayesClassifier()
query_model = MaxQuery()
balance_model = DoubleBalance()
feature_model = Tfidf()

Run the simulation with the ReviewSimulate class.

[7]:
data_obj = ASReviewData.from_file(
    Path("tmp_data", "api_simulation", "data", "van_de_Schoot_2017.csv")
)

[8]:
# Initialize the simulation reviewer
reviewer = ReviewSimulate(
    as_data=data_obj,
    model=train_model,
    query_model=query_model,
    balance_model=balance_model,
    feature_model=feature_model,
    n_instances=10,
    project=project,
    n_prior_included=1,
    n_prior_excluded=1,
)

[9]:
# Start the review process
project.update_review(status="review")
try:
    reviewer.review()
    project.mark_review_finished()
except Exception as err:
    project.update_review(status="error")
    raise err

Export the project to a location of choice, in this case tmp_data/api_example.asreview.

[10]:
# Finish and export the project
project.export(Path("tmp_data", "api_example.asreview"))

The following code removes the temporary folder that was created:

[11]:
import shutil

shutil.rmtree(project_path)