asreview.Database#

class asreview.Database(fp=':memory:', record_cls=<class 'asreview.data.record.Record'>, read_only=False)[source]#

Bases: object

Database containing the input data and results.

Database contains two parts: the input and the results. For more information on the input, see asreview.database.store.py. For more information on the results, see asreview.database.sqlstate.py.

Variables:

user_version (str) – Return the version number of the database.

Methods

__init__([fp, record_cls, read_only])

Initialize the Database.

add_last_ranking(ranked_record_ids, ...[, ...])

Save the ranking of the last iteration of the model.

close()

Close the database and release all resources.

create_tables()

delete_result(record_id)

get_decision_changes()

Get the record ids for any decision changes.

get_last_ranking_table()

Get the ranking from the state.

get_pending([user_id])

Get pending records from the results table.

get_pool()

Get the unlabeled, not-pending records in ranking order.

get_priors()

Get the record ids of the priors.

get_results_record(record_id)

Get the data of a specific query from the results table.

get_results_table([columns, priors, ...])

Get a subset from the results table.

get_unlabeled([groups])

Get the unlabeled record ids in ranking order.

label_record(record_id, label[, tags, user_id])

query_top_ranked([user_id])

update_note(record_id[, note])

Change the note of an already labeled or pending record.

update_result(record_id[, label, tags, user_id])

Attributes

exist_new_labeled_records

Return True if there are new labeled records.

record_table_name

user_version

Version number of the state.

add_last_ranking(ranked_record_ids, classifier, querier, balancer, feature_extractor, training_set=None)[source]#

Save the ranking of the last iteration of the model.

Save the ranking of the last iteration of the model, in the ranking order, so the record on row 0 is ranked first by the model.

Parameters:
  • ranked_record_ids (list, numpy.ndarray) – A list of records ids in the order that they were ranked.

  • classifier (str) – Name of the classifier of the model.

  • querier (str) – Name of the query strategy of the model.

  • balancer (str) – Name of the balance strategy of the model.

  • feature_extractor (str) – Name of the feature extraction method of the model.

  • training_set (int) – Number of labeled records available at the time of training.

close()[source]#

Close the database and release all resources.

For in-memory databases this will destroy the database. Safe to call multiple times.

create_tables()[source]#
delete_result(record_id)[source]#
property exist_new_labeled_records#

Return True if there are new labeled records.

Return True if there are any record labels added since the last time the model ranking was added to the state. Also returns True if no model was trained yet, but priors have been added.

get_decision_changes()[source]#

Get the record ids for any decision changes.

Get the record ids of the records whose labels have been changed after the original labeling action.

Returns:

pd.DataFrame – Dataframe with columns ‘record_id’, ‘label’, ‘time’, and ‘user_id’ for each record of which the labeling decision was changed.

get_last_ranking_table()[source]#

Get the ranking from the state.

Returns:

pd.DataFrame – Dataframe with columns ‘record_id’, ‘ranking’, ‘classifier’, ‘querier’, ‘balancer’, ‘feature_extractor’, ‘training_set’ and ‘time’. It has one row for each record in the dataset, and is ordered by ranking.

get_pending(user_id=None)[source]#

Get pending records from the results table.

Parameters:

user_id (int) – User id of the user who labeled the records.

Returns:

pd.DataFrame – DataFrame with pending results records.

get_pool()[source]#

Get the unlabeled, not-pending records in ranking order.

Returns:

pd.Series – Series containing the record_ids of the unlabeled, not pending records, in the order of the last available ranking. If the state does not yet contain a last ranking, the return value will be an empty dataframe. If multiple records are in the same group, only the base record of the group is returned.

get_priors()[source]#

Get the record ids of the priors.

Returns:

pd.DataFrame – The result records of the priors in the order they were added. If multiple records are in the same group, only the base record of the group is returned.

get_results_record(record_id)[source]#

Get the data of a specific query from the results table.

Parameters:

record_id (int) – Record id of which you want the data.

Returns:

pd.DataFrame – Dataframe containing the data from the results table with the given record_id and columns.

get_results_table(columns=None, priors=True, pending=False, groups=False)[source]#

Get a subset from the results table.

Can be used to get any column subset from the results table. Most other get functions use this one, except some that use a direct SQL query for efficiency.

Parameters:
  • columns (list, str) – List of columns names of the results table, or a string containing one column name.

  • priors (bool) – Whether to keep the records containing the prior knowledge.

  • pending (bool) – Whether to keep the records which are pending a labeling decision.

  • groups (bool) – Return all the records of a group of records. Be default only returns the base record of each group.

Returns:

pd.DataFrame – Dataframe containing the data of the specified columns of the results table.

get_unlabeled(groups=False)[source]#

Get the unlabeled record ids in ranking order.

Records that have no label or no entry in the results table are considered unlabeled.

Parameters:

groups (bool) – If True, return all records in each unlabeled group. If False, return only group representatives (record_id == group_id).

Returns:

pd.Series – Series of record_ids of unlabeled records ordered by ranking.

label_record(record_id, label, tags=None, user_id=None)[source]#
query_top_ranked(user_id=None)[source]#
property record_table_name#
update_note(record_id, note=None)[source]#

Change the note of an already labeled or pending record.

Parameters:
  • record_id (int) – Id of the record whose label should be changed.

  • note (str) – Note to add to the record.

update_result(record_id, label=None, tags=None, user_id=None)[source]#
property user_version#

Version number of the state.