Simulate with Python API#

The ASReview Python API provides advanced control over the ASReview software, allowing users to customize models, implement different sampling strategies, and more. This example demonstrates how to simulate a systematic review using the ASReview API and save the results in an ASReview project file.

[1]:
import asreview as asr
from synergy_dataset import Dataset

Here, we use a dataset from the SYNERGY collection, accessed via the synergy-dataset package.

[2]:
d = Dataset("Hall_2012").to_frame()
d.head()
[2]:
doi title abstract label_included
openalex_id
https://openalex.org/W2131536587 https://doi.org/10.1109/indcon.2010.5712716 Computer vision based offset error computation... The use of computer vision based approach has ... 0
https://openalex.org/W2557025555 https://doi.org/10.1109/induscon.2010.5740045 Design and development of a software for fault... This paper presents an on-line fault diagnosis... 0
https://openalex.org/W2143148279 https://doi.org/10.1109/tpwrd.2005.848672 Analytical Approach to Internal Fault Simulati... A new method for simulating faulted transforme... 0
https://openalex.org/W2111816457 https://doi.org/10.1109/icelmach.2008.4799852 Nonlinear equivalent circuit model of a tracti... The paper presents the development of an equiv... 0
https://openalex.org/W3142547111 https://doi.org/10.1109/ipdps.2006.1639408 Fault tolerance with real-time Java After having drawn up a state of the art on th... 0

Next, we import the required models for the simulation.

[3]:
from asreview.models.balancers import Balanced
from asreview.models.classifiers import SVM
from asreview.models.feature_extractors import Tfidf
from asreview.models.queriers import Max, TopDown
from asreview.models.stoppers import IsFittable

We create a simulation workflow that begins with a top-down reading strategy until both a relevant and an irrelevant article are identified. Afterward, the simulation transitions to an active learning phase powered by an SVM classifier.

[4]:
learners = [
    asr.ActiveLearningCycle(querier=TopDown(), stopper=IsFittable()),
    asr.ActiveLearningCycle(
        querier=Max(),
        classifier=SVM(C=3),
        balancer=Balanced(ratio=5),
        feature_extractor=Tfidf(),
    ),
]

sim = asr.Simulate(
    d,
    d["label_included"],
    learners,
)
sim.review()
Relevant records found: 100%|██████████| 104/104 [02:06<00:00,  1.22s/it]
Records labeled       :  65%|██████▍   | 5672/8793 [02:06<01:09, 44.75it/s]

Loss: 0.022
NDCG: 0.656

Finally, we review the simulation results to analyze the performance and outcomes of the systematic review process.

[5]:
sim._results
[5]:
record_id label classifier querier balancer feature_extractor training_set time note tags user_id
0 0 0 None top_down None None 0 1.745846e+09 None None None
1 1 0 None top_down None None 1 1.745846e+09 None None None
2 2 0 None top_down None None 2 1.745846e+09 None None None
3 3 0 None top_down None None 3 1.745846e+09 None None None
4 4 0 None top_down None None 4 1.745846e+09 None None None
... ... ... ... ... ... ... ... ... ... ... ...
5667 8389 0 svm max balanced tfidf 5667 1.745846e+09 None None None
5668 1739 0 svm max balanced tfidf 5668 1.745846e+09 None None None
5669 4807 0 svm max balanced tfidf 5669 1.745846e+09 None None None
5670 5160 0 svm max balanced tfidf 5670 1.745846e+09 None None None
5671 5647 1 svm max balanced tfidf 5671 1.745846e+09 None None None

5672 rows × 11 columns