asreview.data.ExcelReader#

class asreview.data.ExcelReader[source]#

Bases: BaseReader

Excel file reader.

Methods

`__init__`()
`clean_data`(df)	Clean the raw data.
`read_data`(fp)	Import dataset.
`read_records`(fp, dataset_id[, record_cls])
`standardize_column_names`(df)	Standardize column names of input data.
`to_records`(df[, dataset_id, record_cls])	Turn the cleaned data into records.

Attributes

`mime_types`
`read_format`
`write_format`

classmethod clean_data(df)#

Clean the raw data.

Parameters:: df (pd.DataFrame) – Data to clean. This should be of the same type as the output of read_data.
Returns:: pd.DataFrame – Cleaned data. By default it standardizes the column names, some data types and missing values.

mime_types = {'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx']}#

classmethod read_data(fp)[source]#

Import dataset.

Parameters:: fp (str, pathlib.Path) – File path to the Excel file (.xlsx).
Returns:: list – List with entries.

read_format = ['.xlsx']#

classmethod read_records(fp, dataset_id, record_cls=<class 'asreview.data.record.Record'>, *args, **kwargs)#

classmethod standardize_column_names(df)#

Standardize column names of input data.

The reader can accept multiple names for a specific type of data, for example both ‘title’ and ‘primary_title’ could refer to the column containing the title data. This function makes sure the correct columns are used. See also the attribute __alternative_column_names__ for customizing this behavior.

Parameters:: df (pd.DataFrame) – Dataframe containing raw data.
Returns:: pd.DataFrame – Dataframe with column names lowercased and stripped of white space. In addition, for the columns in __alternative_column_names__, the first alternative column name in the data will be used as input for the column values.

classmethod to_records(df, dataset_id=None, record_cls=<class 'asreview.data.record.Record'>)#

Turn the cleaned data into records.

Parameters:

df (pd.DataFrame) – Cleaned data.
dataset_id (str, optional) – Identifier of the dataset, by default None
record_cls (asreview.data.record.Base, optional) – Record class to use, by default Record

Returns:

list[Record] – List of records.

write_format = ['.csv', '.tsv', '.xlsx']#