Access data from ASReview file#

The API is still under development and can change at any time without warning.

Data generated using ASReview LAB is stored in an ASReview project file. Via the ASReview Python API, there are two ways to access the data in the ASReview (extension .asreview) file: Via the asreview.Project API and the asreview.SQLiteState API. The project API is for retrieving general project settings, the imported dataset, the feature matrix, etc. The state API retrieves data related directly to the reviewing process, such as the labels, the time of labeling, and the classifier used.

The ASReview Python API can be used for project files obtained reviews and simulations.

Example Data#

To illustrate the ASReview Python API, the benchmark dataset van_de_Schoot_2017 is used. The project file example.asreview can be obtained by running the following command:

[ ]:

%%bash

asreview simulate synergy:van_de_Schoot_2018 -o example.asreview

Python imports#

Import the asreview module as asr.

[1]:

from pathlib import Path

import asreview as asr

Project API#

The ASReview project file is a zipped folder with the extension .asreview. This makes inspection of it’s contents straightforward. The asreview Python package offers an API to open a project file directly as an asreview.Project.

For this example, we will create a temporary folder to unpack the project to:

[2]:

from tempfile import TemporaryDirectory

tmpdir = TemporaryDirectory()

Open the project with the load classmethod of asreview.Project.

[3]:

project = asr.Project.load("example.asreview", tmpdir.name)

The following files can be found in the folder

[4]:

tmpdir_path = Path(tmpdir.name)
for path in tmpdir_path.rglob("*"):
    print(path.relative_to(tmpdir_path))

example
example/project.json
example/data_store.db
example/feature_matrices
example/data
example/reviews
example/data/van_de_Schoot_2018.csv
example/reviews/3b7b6c2b3d6f487ba19ffcc4ac8adef5
example/reviews/3b7b6c2b3d6f487ba19ffcc4ac8adef5/results.db

To inspect the project details in project.json, use the following code:

[5]:

project.config

[5]:

{'version': '2.0b6.dev20+gf283ffd9.d20250505',
 'id': 'example',
 'mode': 'simulate',
 'name': 'van_de_Schoot_2018',
 'created_at_unix': 1746457760,
 'reviews': [{'id': '3b7b6c2b3d6f487ba19ffcc4ac8adef5', 'status': 'finished'}],
 'feature_matrices': [],
 'tags': None,
 'datasets': [{'id': 'van_de_Schoot_2018.csv',
   'name': 'van_de_Schoot_2018.csv'}]}

The imported dataset is located at data/{dataset_filename}, and can be inspected using the following code:

[6]:

datastore_fp = Path(tmpdir.name) / "example" / "data_store.db"

dataset = asr.DataStore(datastore_fp)
print(f"The dataset contains {len(dataset)} records.")

The dataset contains 4544 records.

[7]:

dataset.get_df().head()

[7]:

	dataset_row	dataset_id	duplicate_of	title	abstract	authors	keywords	year	doi	url	record_id
0	0	van_de_Schoot_2018.csv	None	Annual Research Review: Resilience and mental ...	Researchers focused on mental health of confli...	[]	[]	None	https://doi.org/10.1111/jcpp.12053	None	0
1	1	van_de_Schoot_2018.csv	None	Profiling the Trauma Related Symptoms of Bosni...	The objective of this study was to profile tra...	[]	[]	None	https://doi.org/10.1097/00005053-200007000-00004	None	1
2	2	van_de_Schoot_2018.csv	None	Acute panicogenic, anxiogenic and dissociative...	Increased anxiety and panic to inhalation of c...	[]	[]	None	https://doi.org/10.1016/j.jpsychires.2011.01.009	None	2
3	3	van_de_Schoot_2018.csv	None	A Pooled Analysis of Gender and Trauma-Type Ef...	To examine effects of gender and trauma type o...	[]	[]	None	https://doi.org/10.4088/jcp.v69n1002	None	3
4	4	van_de_Schoot_2018.csv	None	Twelve-Month Use of Mental Health Services in ...	Dramatic changes have occurred in mental healt...	[]	[]	None	https://doi.org/10.1001/archpsyc.62.6.629	None	4

State API#

The data stored during the review process can be accessed as a pandas DataFrame using the following code:

[8]:

with asr.open_state("example.asreview") as state:
    df_results = state.get_results_table()
    print(f"The state contains {len(df_results)} records.")

The state contains 322 records.

The returned state instance is of type asreview.SQLState. Note that the state contains less records than the original dataset. This is because by default the simulation stopped after finding all relevant records.

[9]:

df_results.tail(10)

[9]:

	record_id	label	classifier	querier	balancer	feature_extractor	training_set	time	note	tags	user_id
312	1376	0	svm	max	balanced	tfidf	312	1746457765.998829	None	None	<NA>
313	1589	0	svm	max	balanced	tfidf	313	1746457766.009229	None	None	<NA>
314	2368	0	svm	max	balanced	tfidf	314	1746457766.019135	None	None	<NA>
315	1359	0	svm	max	balanced	tfidf	315	1746457766.028713	None	None	<NA>
316	952	0	svm	max	balanced	tfidf	316	1746457766.038262	None	None	<NA>
317	1754	0	svm	max	balanced	tfidf	317	1746457766.04837	None	None	<NA>
318	1985	0	svm	max	balanced	tfidf	318	1746457766.05749	None	None	<NA>
319	1196	0	svm	max	balanced	tfidf	319	1746457766.06699	None	None	<NA>
320	1008	0	svm	max	balanced	tfidf	320	1746457766.076556	None	None	<NA>
321	3442	1	svm	max	balanced	tfidf	321	1746457766.086036	None	None	<NA>

You can merge the information from the state file with the original dataset.

[10]:

dataset_with_results = df_results.reset_index(names="labeling_order").join(
    dataset.get_df().set_index("record_id")
)
dataset_with_results.head()

[10]:

	labeling_order	record_id	classifier	querier	balancer	feature_extractor	training_set	time	note	...	dataset_id	duplicate_of	title	abstract	authors	keywords	year	doi	url
0	0	0	None	top_down	None	None	0	1746457762.147489	None	...	van_de_Schoot_2018.csv	None	Annual Research Review: Resilience and mental ...	Researchers focused on mental health of confli...	[]	[]	None	https://doi.org/10.1111/jcpp.12053	None
1	1	1	None	top_down	None	None	1	1746457762.150226	None	...	van_de_Schoot_2018.csv	None	Profiling the Trauma Related Symptoms of Bosni...	The objective of this study was to profile tra...	[]	[]	None	https://doi.org/10.1097/00005053-200007000-00004	None
2	2	2	None	top_down	None	None	2	1746457762.15211	None	...	van_de_Schoot_2018.csv	None	Acute panicogenic, anxiogenic and dissociative...	Increased anxiety and panic to inhalation of c...	[]	[]	None	https://doi.org/10.1016/j.jpsychires.2011.01.009	None
3	3	3	None	top_down	None	None	3	1746457762.153575	None	...	van_de_Schoot_2018.csv	None	A Pooled Analysis of Gender and Trauma-Type Ef...	To examine effects of gender and trauma type o...	[]	[]	None	https://doi.org/10.4088/jcp.v69n1002	None
4	4	4	None	top_down	None	None	4	1746457762.154858	None	...	van_de_Schoot_2018.csv	None	Twelve-Month Use of Mental Health Services in ...	Dramatic changes have occurred in mental healt...	[]	[]	None	https://doi.org/10.1001/archpsyc.62.6.629	None

5 rows × 23 columns

There are also multiple functions to obtain one specific variable in the data. For example, to plot the labeling times in a graph, use the following code (keep in mind that this data is from a simulation):

[11]:

df_results["time"].plot(title="Time of labeling")

[11]:

<Axes: title={'center': 'Time of labeling'}>

../_images/technical_example_api_asreview_file_23_1.png

By default, the records that are part of the prior knowledge are included in the results. To obtain the labels use the following code:

[12]:

df_results["label"]

[12]:

0      0
1      0
2      0
3      0
4      0
      ..
317    0
318    0
319    0
320    0
321    1
Name: label, Length: 322, dtype: Int64

For normal reviews, the state also contains the ranking of the last iteration of the machine learning model. To get these, use the following code:

[13]:

with asr.open_state("example.asreview") as state:
    last_ranking = state.get_last_ranking_table()

last_ranking

[13]:

	record_id	ranking	classifier	querier	balancer	feature_extractor	training_set	time