asreview.data.statistics.n_duplicates

asreview.data.statistics.n_duplicates(data, pid='doi')[source]

Number of duplicates.

Duplicate detection can be a very challenging task. Multiple algorithms can be used and results can be vary.

Parameters:
  • data (asreview.Dataset) – An Dataset object with the records.

  • pid (string) – Which persistent identifier (PID) to use for deduplication. Default is ‘doi’.

Returns:

int – Number of duplicates