{ "cells": [ { "cell_type": "markdown", "id": "be79a9d5", "metadata": {}, "source": [ "# Simulate with Python API\n", "\n", "The ASReview Python API provides advanced control over the ASReview software, allowing users to customize models, implement different sampling strategies, and more. This example demonstrates how to simulate a systematic review using the ASReview API and save the results in an ASReview project file." ] }, { "cell_type": "code", "execution_count": 1, "id": "5ea4a81e", "metadata": {}, "outputs": [], "source": [ "import asreview as asr\n", "from synergy_dataset import Dataset" ] }, { "cell_type": "markdown", "id": "656ddec4", "metadata": {}, "source": [ "Here, we use a dataset from the SYNERGY collection, accessed via the `synergy-dataset` package." ] }, { "cell_type": "code", "execution_count": 2, "id": "b45e456e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
doititleabstractlabel_included
openalex_id
https://openalex.org/W2131536587https://doi.org/10.1109/indcon.2010.5712716Computer vision based offset error computation...The use of computer vision based approach has ...0
https://openalex.org/W2557025555https://doi.org/10.1109/induscon.2010.5740045Design and development of a software for fault...This paper presents an on-line fault diagnosis...0
https://openalex.org/W2143148279https://doi.org/10.1109/tpwrd.2005.848672Analytical Approach to Internal Fault Simulati...A new method for simulating faulted transforme...0
https://openalex.org/W2111816457https://doi.org/10.1109/icelmach.2008.4799852Nonlinear equivalent circuit model of a tracti...The paper presents the development of an equiv...0
https://openalex.org/W3142547111https://doi.org/10.1109/ipdps.2006.1639408Fault tolerance with real-time JavaAfter having drawn up a state of the art on th...0
\n", "
" ], "text/plain": [ " doi \\\n", "openalex_id \n", "https://openalex.org/W2131536587 https://doi.org/10.1109/indcon.2010.5712716 \n", "https://openalex.org/W2557025555 https://doi.org/10.1109/induscon.2010.5740045 \n", "https://openalex.org/W2143148279 https://doi.org/10.1109/tpwrd.2005.848672 \n", "https://openalex.org/W2111816457 https://doi.org/10.1109/icelmach.2008.4799852 \n", "https://openalex.org/W3142547111 https://doi.org/10.1109/ipdps.2006.1639408 \n", "\n", " title \\\n", "openalex_id \n", "https://openalex.org/W2131536587 Computer vision based offset error computation... \n", "https://openalex.org/W2557025555 Design and development of a software for fault... \n", "https://openalex.org/W2143148279 Analytical Approach to Internal Fault Simulati... \n", "https://openalex.org/W2111816457 Nonlinear equivalent circuit model of a tracti... \n", "https://openalex.org/W3142547111 Fault tolerance with real-time Java \n", "\n", " abstract \\\n", "openalex_id \n", "https://openalex.org/W2131536587 The use of computer vision based approach has ... \n", "https://openalex.org/W2557025555 This paper presents an on-line fault diagnosis... \n", "https://openalex.org/W2143148279 A new method for simulating faulted transforme... \n", "https://openalex.org/W2111816457 The paper presents the development of an equiv... \n", "https://openalex.org/W3142547111 After having drawn up a state of the art on th... \n", "\n", " label_included \n", "openalex_id \n", "https://openalex.org/W2131536587 0 \n", "https://openalex.org/W2557025555 0 \n", "https://openalex.org/W2143148279 0 \n", "https://openalex.org/W2111816457 0 \n", "https://openalex.org/W3142547111 0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = Dataset(\"Hall_2012\").to_frame()\n", "d.head()" ] }, { "cell_type": "markdown", "id": "d0c52295", "metadata": {}, "source": [ "Next, we import the required models for the simulation." ] }, { "cell_type": "code", "execution_count": 3, "id": "5220ee4c", "metadata": {}, "outputs": [], "source": [ "from asreview.models.balancers import Balanced\n", "from asreview.models.classifiers import SVM\n", "from asreview.models.feature_extractors import Tfidf\n", "from asreview.models.queriers import Max, TopDown\n", "from asreview.models.stoppers import IsFittable" ] }, { "cell_type": "markdown", "id": "a5ca50d7", "metadata": {}, "source": [ "We create a simulation workflow that begins with a top-down reading strategy until both a relevant and an irrelevant article are identified. Afterward, the simulation transitions to an active learning phase powered by an SVM classifier." ] }, { "cell_type": "code", "execution_count": 4, "id": "8d71a8f8", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Relevant records found: 100%|██████████| 104/104 [02:06<00:00, 1.22s/it]\n", "Records labeled : 65%|██████▍ | 5672/8793 [02:06<01:09, 44.75it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Loss: 0.022\n", "NDCG: 0.656\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "learners = [\n", " asr.ActiveLearningCycle(querier=TopDown(), stopper=IsFittable()),\n", " asr.ActiveLearningCycle(\n", " querier=Max(),\n", " classifier=SVM(C=3),\n", " balancer=Balanced(ratio=5),\n", " feature_extractor=Tfidf(),\n", " ),\n", "]\n", "\n", "sim = asr.Simulate(\n", " d,\n", " d[\"label_included\"],\n", " learners,\n", ")\n", "sim.review()" ] }, { "cell_type": "markdown", "id": "803806bd", "metadata": {}, "source": [ "Finally, we review the simulation results to analyze the performance and outcomes of the systematic review process." ] }, { "cell_type": "code", "execution_count": 5, "id": "f5f443a5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
record_idlabelclassifierquerierbalancerfeature_extractortraining_settimenotetagsuser_id
000Nonetop_downNoneNone01.745846e+09NoneNoneNone
110Nonetop_downNoneNone11.745846e+09NoneNoneNone
220Nonetop_downNoneNone21.745846e+09NoneNoneNone
330Nonetop_downNoneNone31.745846e+09NoneNoneNone
440Nonetop_downNoneNone41.745846e+09NoneNoneNone
....................................
566783890svmmaxbalancedtfidf56671.745846e+09NoneNoneNone
566817390svmmaxbalancedtfidf56681.745846e+09NoneNoneNone
566948070svmmaxbalancedtfidf56691.745846e+09NoneNoneNone
567051600svmmaxbalancedtfidf56701.745846e+09NoneNoneNone
567156471svmmaxbalancedtfidf56711.745846e+09NoneNoneNone
\n", "

5672 rows × 11 columns

\n", "
" ], "text/plain": [ " record_id label classifier querier balancer feature_extractor \\\n", "0 0 0 None top_down None None \n", "1 1 0 None top_down None None \n", "2 2 0 None top_down None None \n", "3 3 0 None top_down None None \n", "4 4 0 None top_down None None \n", "... ... ... ... ... ... ... \n", "5667 8389 0 svm max balanced tfidf \n", "5668 1739 0 svm max balanced tfidf \n", "5669 4807 0 svm max balanced tfidf \n", "5670 5160 0 svm max balanced tfidf \n", "5671 5647 1 svm max balanced tfidf \n", "\n", " training_set time note tags user_id \n", "0 0 1.745846e+09 None None None \n", "1 1 1.745846e+09 None None None \n", "2 2 1.745846e+09 None None None \n", "3 3 1.745846e+09 None None None \n", "4 4 1.745846e+09 None None None \n", "... ... ... ... ... ... \n", "5667 5667 1.745846e+09 None None None \n", "5668 5668 1.745846e+09 None None None \n", "5669 5669 1.745846e+09 None None None \n", "5670 5670 1.745846e+09 None None None \n", "5671 5671 1.745846e+09 None None None \n", "\n", "[5672 rows x 11 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sim._results" ] } ], "metadata": { "kernelspec": { "display_name": "asreview-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.0" } }, "nbformat": 4, "nbformat_minor": 5 }