asreview.data.RISReader#

class asreview.data.RISReader[source]#

Bases: BaseReader

RIS file reader.

Methods

__init__()

clean_data(df)

Clean the raw data.

read_data(fp)

Import dataset.

read_records(fp, dataset_id[, record_cls])

standardize_column_names(df)

Standardize column names of input data.

to_records(df[, dataset_id, record_cls])

Turn the cleaned data into records.

Attributes

classmethod clean_data(df)[source]#

Clean the raw data.

Parameters:

df (pd.DataFrame) – Data to clean. This should be of the same type as the output of read_data.

Returns:

pd.DataFrame – Cleaned data. By default it standardizes the column names, some data types and missing values.

classmethod read_data(fp)[source]#

Import dataset.

Parameters:

fp (str, pathlib.Path) – File path to the RIS file.

Returns:

pd.DataFrame – Dataframe with entries. If the notes field contains a note with the text ASReview_relevant, ASReview_irrelevant or ASReview_not_seen, the data frame will have a column included with the value 1, 0 or None.

Raises:

ValueError – File with unrecognized encoding is used as input.

read_format = ['.ris', '.txt']#
classmethod read_records(fp, dataset_id, record_cls=<class 'asreview.data.record.Record'>, *args, **kwargs)#
classmethod standardize_column_names(df)#

Standardize column names of input data.

The reader can accept multiple names for a specific type of data, for example both ‘title’ and ‘primary_title’ could refer to the column containing the title data. This function makes sure the correct columns are used. See also the attribute __alternative_column_names__ for customizing this behavior.

Parameters:

df (pd.DataFrame) – Dataframe containing raw data.

Returns:

pd.DataFrame – Dataframe with column names lowercased and stripped of white space. In addition, for the columns in __alternative_column_names__, the first alternative column name in the data will be used as input for the column values.

classmethod to_records(df, dataset_id=None, record_cls=<class 'asreview.data.record.Record'>)#

Turn the cleaned data into records.

Parameters:
  • df (pd.DataFrame) – Cleaned data.

  • dataset_id (str, optional) – Identifier of the dataset, by default None

  • record_cls (asreview.data.record.Base, optional) – Record class to use, by default Record

Returns:

list[Record] – List of records.

write_format = ['.csv', '.tsv', '.xlsx', '.ris']#