Anyone who goes through the process of screening large amounts of texts knows how labor intensive this can be. The future will be an interaction with machine learning algorithms to deal with the enormous increase of available text. Therefore, an open source machine learning-aided pipeline applying active learning was developed at Utrecht University, titled ASReview. The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible, while being transparent in the process.
ASReview is a research project coordinated by Rens van de Schoot (full Professor at the Department of Methodology & Statistics and ambassador of the focus area Applied Data Science at Utrecht University, The Netherlands), together with Jonathan de Bruin, Lead engineer of the ASReview project and working at the Information and Technology Services department at Utrecht University.
Our advisory board consists of machine learning expert Daniel Oberski, associate professor at Utrecht University’s Department of Methodology & Statistics, and the department of Biostatistics at the Julius Center, University Medical Center Utrecht), full professor Lars Tummers (Professor of Public Management and Behavior at Utrecht University), Ayoub Bagheri (NLP-expert at Utrecht University), Bianca Kramer (Open Science expert at the Utrecht University library), Jan de Boer (Information specialist at the Utrecht university library), Felix Weijdema (Systematic review specialist at the Utrecht University library), and Martijn Hutijs (UX-expert at the department Test and Quality Services at Utrecht University).
The Art-Work of ASReview was developed by Joukje Willemsen.
Moreover, many others helped the project, like researchers Gerbrich Ferdinands and Laura Hofstee, as well as many students like Yongchao Terry Ma, Sofie van den Brand, Sybren Hindriks, and Albert Harkema. Many thanks to all the contributors!
The Case of Systematic Reviewing¶
With the emergence of online publishing, the number of scientific papers on any topic, e.g. COVID19, is skyrocketing. Simultaneously, the public press and social media also produce data by the second. All this textual data presents opportunities to scholars, but it also confronts them with new challenges. To summarize all this data, researchers write systematic reviews, providing essential, comprehensive overviews of relevant topics. To achieve this, they have to screen (tens of) thousands of studies by hand for inclusion in their overview. As truly relevant papers are very sparse (i.e., often <10%), this is an extremely imbalanced data problem. The process of finding these rare relevant papers is error prone and very time intensive.
The rapidly evolving field of machine learning (ML) has allowed the development of ML-aided pipelines that assist in finding relevant texts for such search tasks. A well-established approach to increase the efficiency of title and abstract screening is determining prioritization with active learning, which is very effective for systematic reviewing.
The goal of ASReview is to help scholars and practitioners to get an overview of the most relevant records for their work as efficiently as possible, while being transparent in the process. It is uses active learning, allows multiple ML-models, and ships with a benchmark mode which is especially useful for comparing and designing algorithms. Furthermore, it is intended to be easily extensible, allowing third parties to add modules that enhance the pipeline and can process any text (although we consider systematic reviewing as a very useful approach).