asreview.DataStore#

class asreview.DataStore(fp=':memory:', record_cls=<class 'asreview.data.record.Record'>, read_only=False, conn_uri=None)[source]#

Bases: object

Data store to hold user input data.

Data input always happens via the record class. This means that if you want to add data to the data store, you will first need to clean it, make sure it has the correct columns and make sure it passes the validations defined in the record class.

Getting data from the store can happen in rows or in columns. If you read rows, you will get record objects as response. If you read columns, you will get pandas objects. If you ask for a single column you get a pandas Series, and if you ask for multiple columns you get a pandas DataFrame.

DataStore uses an SQLite database in the backend and SQLAlchemy ORM to interact with the database.

Methods

__init__([fp, record_cls, read_only, conn_uri])

Initialize the data store.

add_records(records)

Add records to the data store.

create_tables()

Initialize the tables containing the data.

delete_record(record_id)

Delete a record from the store.

get_df()

Get all data from the data store as a pandas DataFrmae.

get_groups([record_id])

Get the record groups.

get_records([record_id])

Get the records with the given record identifiers.

is_empty()

set_groups(groups)

Add record group information to the data store.

Attributes

columns

pandas_dtype_mapping

pandas data type}

add_records(records)[source]#

Add records to the data store.

Parameters:

records (list[self.record_cls]) – List of records to add to the store.

Raises:

ValueError – If some record.duplicate_of points to a non-existing record_id.

property columns#
create_tables()[source]#

Initialize the tables containing the data.

If you are creating a new data store, you will need to call this method before adding data to the data store.

delete_record(record_id)[source]#

Delete a record from the store.

WARNING: This method is purely here for completeness, it should not be used in any production setting. Deleting records can lead to undefined behavior because we make assumptions about the record_id in other parts of the code.

get_df()[source]#

Get all data from the data store as a pandas DataFrmae.

Returns:

pd.DataFrame

get_groups(record_id=None)[source]#

Get the record groups.

Parameters:

record_id (int | None) – Get only the group containing the record with this record_id.

Returns:

list[tuple[int, int]] – List of tuples (group_id, record_id) ordered by group id. The tuples values are also accessible by the attribute names (so tuple.group_id and tuple.record_id).

get_records(record_id=None)[source]#

Get the records with the given record identifiers.

Parameters:

record_id (int | list[int] | None) – Record identifier or list record identifiers. If None, get all records.

Returns:

asreview.data.record.Record | list[asreview.data.record.Record] | None

is_empty()[source]#
property pandas_dtype_mapping#

pandas data type}

Type:

Mapping {column name

set_groups(groups)[source]#

Add record group information to the data store.

Parameters:

groups (list[tuple[int,int]]) – List of tuples (group_id, record_id). This data is added to the record as the duplicate_of attribute. The data store will normalize these values: One record is chosen as the root, satisfying root.duplicate_of = None. All other records in the group will get record.duplicate_of = root.record_id.