asreview.Dataset.drop_duplicates

Dataset.drop_duplicates(pid='doi', inplace=False, reset_index=True)[source]

Drop duplicate records.

Drop duplicates based on titles and abstracts and if available, on a persistent identifier (PID) such the Digital Object Identifier (DOI).

Parameters:
  • pid (string, default 'doi') – Which persistent identifier to use for deduplication.

  • inplace (boolean, default False) – Whether to modify the DataFrame rather than creating a new one.

  • reset_index (boolean, default True) – If True, the existing index column is reset to the default integer index.

Returns:

pandas.DataFrame or None – DataFrame with duplicates removed or None if inplace=True