Access data from ASReview file#

The API is still under development and can change at any time without warning.

Data generated using ASReview LAB is stored in an ASReview project file. Via the ASReview Python API, there are two ways to access the data in the ASReview (extension .asreview) file: Via the asreview.Project API and the asreview.SQLiteState API. The project API is for retrieving general project settings, the imported dataset, the feature matrix, etc. The state API retrieves data related directly to the reviewing process, such as the labels, the time of labeling, and the classifier used.

The ASReview Python API can be used for project files obtained reviews and simulations.

Example Data#

To illustrate the ASReview Python API, the benchmark dataset van_de_Schoot_2017 is used. The project file example.asreview can be obtained by running the following command:

[ ]:
%%bash

asreview simulate synergy:van_de_Schoot_2018 -o example.asreview

Python imports#

Import the asreview module as asr.

[1]:
from pathlib import Path

import asreview as asr

Project API#

The ASReview project file is a zipped folder with the extension .asreview. This makes inspection of it’s contents straightforward. The asreview Python package offers an API to open a project file directly as an asreview.Project.

For this example, we will create a temporary folder to unpack the project to:

[2]:
from tempfile import TemporaryDirectory

tmpdir = TemporaryDirectory()

Open the project with the load classmethod of asreview.Project.

[3]:
project = asr.Project.load("example.asreview", tmpdir.name)

The following files can be found in the folder

[4]:
tmpdir_path = Path(tmpdir.name)
for path in tmpdir_path.rglob("*"):
    print(path.relative_to(tmpdir_path))
example
example/project.json
example/data_store.db
example/feature_matrices
example/data
example/reviews
example/data/van_de_Schoot_2018.csv
example/reviews/3b7b6c2b3d6f487ba19ffcc4ac8adef5
example/reviews/3b7b6c2b3d6f487ba19ffcc4ac8adef5/results.db

To inspect the project details in project.json, use the following code:

[5]:
project.config
[5]:
{'version': '2.0b6.dev20+gf283ffd9.d20250505',
 'id': 'example',
 'mode': 'simulate',
 'name': 'van_de_Schoot_2018',
 'created_at_unix': 1746457760,
 'reviews': [{'id': '3b7b6c2b3d6f487ba19ffcc4ac8adef5', 'status': 'finished'}],
 'feature_matrices': [],
 'tags': None,
 'datasets': [{'id': 'van_de_Schoot_2018.csv',
   'name': 'van_de_Schoot_2018.csv'}]}

The imported dataset is located at data/{dataset_filename}, and can be inspected using the following code:

[6]:
datastore_fp = Path(tmpdir.name) / "example" / "data_store.db"

dataset = asr.DataStore(datastore_fp)
print(f"The dataset contains {len(dataset)} records.")
The dataset contains 4544 records.
[7]:
dataset.get_df().head()
[7]:
dataset_row dataset_id duplicate_of title abstract authors keywords year doi url included record_id
0 0 van_de_Schoot_2018.csv None Annual Research Review: Resilience and mental ... Researchers focused on mental health of confli... [] [] None https://doi.org/10.1111/jcpp.12053 None 0 0
1 1 van_de_Schoot_2018.csv None Profiling the Trauma Related Symptoms of Bosni... The objective of this study was to profile tra... [] [] None https://doi.org/10.1097/00005053-200007000-00004 None 0 1
2 2 van_de_Schoot_2018.csv None Acute panicogenic, anxiogenic and dissociative... Increased anxiety and panic to inhalation of c... [] [] None https://doi.org/10.1016/j.jpsychires.2011.01.009 None 0 2
3 3 van_de_Schoot_2018.csv None A Pooled Analysis of Gender and Trauma-Type Ef... To examine effects of gender and trauma type o... [] [] None https://doi.org/10.4088/jcp.v69n1002 None 0 3
4 4 van_de_Schoot_2018.csv None Twelve-Month Use of Mental Health Services in ... Dramatic changes have occurred in mental healt... [] [] None https://doi.org/10.1001/archpsyc.62.6.629 None 0 4

State API#

The data stored during the review process can be accessed as a pandas DataFrame using the following code:

[8]:
with asr.open_state("example.asreview") as state:
    df_results = state.get_results_table()
    print(f"The state contains {len(df_results)} records.")
The state contains 322 records.

The returned state instance is of type asreview.SQLState. Note that the state contains less records than the original dataset. This is because by default the simulation stopped after finding all relevant records.

[9]:
df_results.tail(10)
[9]:
record_id label classifier querier balancer feature_extractor training_set time note tags user_id
312 1376 0 svm max balanced tfidf 312 1746457765.998829 None None <NA>
313 1589 0 svm max balanced tfidf 313 1746457766.009229 None None <NA>
314 2368 0 svm max balanced tfidf 314 1746457766.019135 None None <NA>
315 1359 0 svm max balanced tfidf 315 1746457766.028713 None None <NA>
316 952 0 svm max balanced tfidf 316 1746457766.038262 None None <NA>
317 1754 0 svm max balanced tfidf 317 1746457766.04837 None None <NA>
318 1985 0 svm max balanced tfidf 318 1746457766.05749 None None <NA>
319 1196 0 svm max balanced tfidf 319 1746457766.06699 None None <NA>
320 1008 0 svm max balanced tfidf 320 1746457766.076556 None None <NA>
321 3442 1 svm max balanced tfidf 321 1746457766.086036 None None <NA>

You can merge the information from the state file with the original dataset.

[10]:
dataset_with_results = df_results.reset_index(names="labeling_order").join(
    dataset.get_df().set_index("record_id")
)
dataset_with_results.head()
[10]:
labeling_order record_id label classifier querier balancer feature_extractor training_set time note ... dataset_id duplicate_of title abstract authors keywords year doi url included
0 0 0 0 None top_down None None 0 1746457762.147489 None ... van_de_Schoot_2018.csv None Annual Research Review: Resilience and mental ... Researchers focused on mental health of confli... [] [] None https://doi.org/10.1111/jcpp.12053 None 0
1 1 1 0 None top_down None None 1 1746457762.150226 None ... van_de_Schoot_2018.csv None Profiling the Trauma Related Symptoms of Bosni... The objective of this study was to profile tra... [] [] None https://doi.org/10.1097/00005053-200007000-00004 None 0
2 2 2 0 None top_down None None 2 1746457762.15211 None ... van_de_Schoot_2018.csv None Acute panicogenic, anxiogenic and dissociative... Increased anxiety and panic to inhalation of c... [] [] None https://doi.org/10.1016/j.jpsychires.2011.01.009 None 0
3 3 3 0 None top_down None None 3 1746457762.153575 None ... van_de_Schoot_2018.csv None A Pooled Analysis of Gender and Trauma-Type Ef... To examine effects of gender and trauma type o... [] [] None https://doi.org/10.4088/jcp.v69n1002 None 0
4 4 4 0 None top_down None None 4 1746457762.154858 None ... van_de_Schoot_2018.csv None Twelve-Month Use of Mental Health Services in ... Dramatic changes have occurred in mental healt... [] [] None https://doi.org/10.1001/archpsyc.62.6.629 None 0

5 rows × 23 columns

There are also multiple functions to obtain one specific variable in the data. For example, to plot the labeling times in a graph, use the following code (keep in mind that this data is from a simulation):

[11]:
df_results["time"].plot(title="Time of labeling")
[11]:
<Axes: title={'center': 'Time of labeling'}>
../_images/technical_example_api_asreview_file_23_1.png

By default, the records that are part of the prior knowledge are included in the results. To obtain the labels use the following code:

[12]:
df_results["label"]
[12]:
0      0
1      0
2      0
3      0
4      0
      ..
317    0
318    0
319    0
320    0
321    1
Name: label, Length: 322, dtype: Int64

For normal reviews, the state also contains the ranking of the last iteration of the machine learning model. To get these, use the following code:

[13]:
with asr.open_state("example.asreview") as state:
    last_ranking = state.get_last_ranking_table()

last_ranking
[13]:
record_id ranking classifier querier balancer feature_extractor training_set time