# Fully and partially labeled data¶

Fully and partially labeled datasets serve a special role in the ASReview context. These datasets have review decisions for a subset of the records or for each record in the dataset. The labels are dichotomous: relevant or irrelevant. Partially labeled data is useful in the Oracle mode, whereas Fully labeled data is useful in the Simulation and Exploration mode. See Project modes for more information.

All datasets exported from ASReview LAB can be imported into ASReview LAB again. All labels are recognized by the software. In Oracle mode, all labels are directly added as Prior Knowledge.

## Labeled data format¶

For tabular datasets (e.g., CSV, XLSX), the dataset should contain a column called “included” or “label” (See Data format for all naming conventions), which is filled with 1’s or 0’s for the records that are already screened. The value is left empty for the records that you haven’t screened yet.

For the RIS file format, the labels ASReview_relevant, ASReview_irrelevant, and ASReview_not_seen) can be stored with the N1 (Notes) tag. An example of a RIS file with labels in the N1 tag can be found in the ASReview GitHub repository. All labels in this example are valid ways to label the data. Exported RIS file from ASReview LAB can be imported into ASReview LAB again, and whereafter all labels are recognized.

## Partially labeled data¶

Note

Partially labeled datasets are datasets with a review decision for a subset of the records in the dataset. A partially labeled dataset can be obtained by exporting results from ASReview LAB or other software. It can also be constructed given the format described above.

Partially labeled datasets are useful as the labels will be recognized by ASReview LAB as Prior Knowledge, and labels are used to train the first iteration of the active learning model.

Note

Merging labeled with unlabeled data should be done outside ASReview LAB, for example, with Citation Managers.

## Fully labeled data¶

Note

These Benchmark Datasets are directly available in the software. During the Add Dataset step of the project setup, there is a panel with all the datasets. The datasets can be selected and used directly. Benchmark datasets are also available via the Command Line. Use the prefix benchmark: followed by the identifier of the dataset (see Systematic Review Datasets repository). For example, to use the Van de Schoot et al. (2017) dataset, use benchmark:van_de_schoot_2017.