asreview.simulation.Simulate

class asreview.simulation.Simulate(as_data, project, classifier=<asreview.models.classifiers.nb.NaiveBayesClassifier object>, query_model=<asreview.models.query.max.MaxQuery object>, balance_model=<asreview.models.balance.simple.SimpleBalance object>, feature_model=<asreview.models.feature_extraction.tfidf.Tfidf object>, n_prior_included=0, n_prior_excluded=0, prior_indices=None, n_papers=None, n_instances=1, stop_if=None, start_idx=None, init_seed=None, write_interval=None, **kwargs)[source]

ASReview Simulation mode class.

Parameters:
  • as_data (asreview.Dataset) – The data object which contains the text, labels, etc.

  • model (BaseModel) – Initialized model to fit the data during active learning. See asreview.models.utils.py for possible models.

  • query_model (BaseQueryModel) – Initialized model to query new instances for review, such as random sampling or max sampling. See asreview.query_strategies.utils.py for query models.

  • balance_model (BaseBalanceModel) – Initialized model to redistribute the training data during the active learning process. They might either resample or undersample specific papers.

  • feature_model (BaseFeatureModel) – Feature extraction model that converts texts and keywords to feature matrices.

  • n_prior_included (int) – Sample n prior included papers.

  • n_prior_excluded (int) – Sample n prior excluded papers.

  • prior_indices (int) – Prior indices by row number.

  • n_instances (int) – Number of papers to query at each step in the active learning process.

  • stop_if (int) – Number of steps/queries to perform. Set to None for no limit.

  • start_idx (numpy.ndarray) – Start the simulation/review with these indices. They are assumed to be already labeled. Failing to do so might result bad behaviour.

  • init_seed (int) – Seed for setting the prior indices if the –prior_idx option is not used. If the option prior_idx is used with one or more index, this option is ignored.

  • state_file (str) – Path to state file.

  • write_interval (int) – After how many labeled records to write the simulation data to the state.

Attributes

settings

Get an ASReview settings object

Methods

review()

train()

Train a new model on the labeled data.