asreview.data.ExcelReader#
- class asreview.data.ExcelReader[source]#
Bases:
BaseReader
Excel file reader.
Methods
__init__
()clean_data
(df)Clean the raw data.
read_data
(fp)Import dataset.
read_records
(fp, dataset_id[, record_cls])Standardize column names of input data.
to_records
(df[, dataset_id, record_cls])Turn the cleaned data into records.
Attributes
- classmethod clean_data(df)#
Clean the raw data.
- Parameters:
df (pd.DataFrame) – Data to clean. This should be of the same type as the output of read_data.
- Returns:
pd.DataFrame – Cleaned data. By default it standardizes the column names, some data types and missing values.
- classmethod read_data(fp)[source]#
Import dataset.
- Parameters:
fp (str, pathlib.Path) – File path to the Excel file (.xlsx).
- Returns:
list – List with entries.
- read_format = ['.xlsx']#
- classmethod read_records(fp, dataset_id, record_cls=<class 'asreview.data.record.Record'>, *args, **kwargs)#
- classmethod standardize_column_names(df)#
Standardize column names of input data.
The reader can accept multiple names for a specific type of data, for example both ‘title’ and ‘primary_title’ could refer to the column containing the title data. This function makes sure the correct columns are used. See also the attribute __alternative_column_names__ for customizing this behavior.
- Parameters:
df (pd.DataFrame) – Dataframe containing raw data.
- Returns:
pd.DataFrame – Dataframe with column names lowercased and stripped of white space. In addition, for the columns in __alternative_column_names__, the first alternative column name in the data will be used as input for the column values.
- classmethod to_records(df, dataset_id=None, record_cls=<class 'asreview.data.record.Record'>)#
Turn the cleaned data into records.
- Parameters:
df (pd.DataFrame) – Cleaned data.
dataset_id (str, optional) – Identifier of the dataset, by default None
record_cls (asreview.data.record.Base, optional) – Record class to use, by default Record
- Returns:
list[Record] – List of records.
- write_format = ['.csv', '.tsv', '.xlsx']#