DedupEndNote: Test results - details

Own test database: portal vein thrombosis (52,828 records)

The following queries were executed in 8 bibliographic databases on May 30, 2020.

Database Query Number of records
CINAHL Plus with full text (EBSCOHost) (MH "Hepatic Vein Thrombosis")
Date: 12-6-2020
Export: Share > E-mail a link to download exported results. RIS format.
358
Cochrane Library (Trials) Title Abstract Keywords: (portal vein thrombosis) OR (portal venous thrombosis) OR (portal vein obstruction) OR (portal venous obstruction)
Date: 30-5-2020
482
EMBASE (Ovid) exp portal vein thrombosis/
Date: 13-6-2020
Export: Format RIS, Fields: Complete reference (Not: Format: EndNote because Article Number information is not imported)
11,452
EMBASE.com portal vein thrombosis/
Date: 16-6-2020
Export: ???
11,300
Medline (Ovid) exp Thrombosis/
limit 1 to (clinical trial, all or consensus development conference or guideline or randomized controlled trial or "review" or "systematic review")
limit 2 to "core clinical journals (aim)"
Query is intentionally different from PubMed query to create some, but not complete overlap
Date: 30-5-2020
3,593
PsycINFO (Ovid) exp thromboses/
Export: Complete reference
Date: 12-6-2020
903
PubMed (portal vein thrombosis) OR (portal venous thrombosis) OR (portal vein obstruction) OR (portal venous obstruction)
Date: 30-5-2020
10,433
Scopus Article title, Abstract, Keywords: (portal vein thrombosis) OR (portal venous thrombosis) OR (portal vein obstruction) OR (portal venous obstruction)
Date: 1-6-2020
Export: (The export uses all Citation information fields, and from the Bibliographic information fields: Serial identifiers (e.g. ISSN) and Abbreviated source title. The Abstract field was not downloaded)
Imported into EndNote as RIS format, not with the Scopus import filter (does not import C7 field (Article number))
16,942
Web of Science (1975-...) Topic: (portal vein thrombosis) OR (portal venous thrombosis) OR (portal vein obstruction) OR (portal venous obstruction)
Date: 1-6-2020
Export: Full record
8,665
Total 52,828

All records were imported into an EndNote X9 3.3 (Bld 13966) database. If the files were imported manually (File > Import ...), the following options were used:

  • Import option: Reference Manager (RIS)
  • Duplicates: Import all
  • Text translation: No translation

The following fields were cleared in the EndNote database:

  • CINAHL: Fields cleared: Abstract, Accession number, Author address, Keywords, Name of database, Notes, URL. Fields filled: Database provider: CINAHL
  • Cochrane Library (Trials): Fields cleared: Abstract, Accession number, Keywords, URL. Fields filled: Database provider: Cochrane
  • EMBASE OVID: Fields cleared: Abstract, Author address, Keywords, Place published, Publisher, URL. Fields filled: Database provider: EMBASE_OVID
  • Medline: Fields cleared: Abstract, Accession number, Author address, Keywords, Name of database, Notes, URL. Fields filled: Database provider: Medline_OVID
  • PsycINFO: Fields cleared: Abstract, Author address, Keywords, Place published, Publisher, Secondary author, URL. Fields filled: Database provider: PsycINFO_OVID
  • PubMed: Fields cleared: Abstract, Accession number, Author address, Keywords, Notes. Fields filled: Database provider: PubMed
  • Scopus: Fields cleared: Abstract, Author address, Name of database, Notes, URL. Fields filled: Database provider: Scopus
  • Web of Science: Fields cleared: Abstract, Accession number, Author address, Keywords, Name of database, Notes, URL. Fields filled: Database provider: WoS

The databases were imported in the following order, each database sorted on Record number:

  • PubMed
  • EMBASE OVID
  • Scopus
  • Web of Science
  • Cochrane Library (Trials)
  • CINAHL
  • PsycINFO
  • Medline

The complete EndNote DB was sorted on the Author field: the complete ordering (see: Tools / Sort library ...) was by Author, Year, Record number, which generally puts the PubMed records before the EMBASE OVID, ...

field special cases deduplication results
Authors (empty): 150 records
"Anonymous,": 78 records
96 merged successfully
22 merged successfully
Publication year (empty): 71 records 44 merged successfully
Title (empty): 3 records 1 merged
Journal (empty): 29 records 4 merged
Pages (empty): 1542 records
AND Article Number: 569
977 merged successfully
490 merged successfully
Article number NOT empty: 1112 records
AND without Pages: 569 records
945 merged successfully
490 merged successfully
Reference Type Book: 3
Book Section: 143
Conference Proceedings: 6
Generic: 13
Journal Article: 52460
Serial: 23
Book: 0 merged
Book Section: 6 merged
Conference Proceedings: 0 merged
Generic: 2
Journal Article: 38401
Serial: 12
Validating

Checking for false positives outside the manually checked validation set (1585 records):

The False Positive result was: same year - same authors - same journal - similar title - different startpages but same doi
AND the DOI was for the whole conference abstracts issue.

Assuming that:

  • Start page is present
  • Start page contains a letter ("A51", "51A", "S51", ...)

looked in the markMode database for records:

  • with a label (i.e. duplicate found)
  • with Pages containing "S" or "A"
  • with a DOI

ordered on label, manually checking the titles. No other cases found.

TODO: Test is incomplete: start page could also be empty. But that would mean checking 29.000 records! (Maybe putting the records in a relational DB could help. OR: Check the records with DOIs which are not unique: 1880 DOIs, see Excel)

Cannabis database (15,216 records)

See Test results - details for a description of this test database.

No details found about the content. Most records are imported from EMBASE (OVID), PubMed en Web of Science? Some records are imported more than once from the same database?

Fields cleared: Abstract, Accession number, Author address, Keywords, Notes

field special cases deduplication results
Authors (empty): 3 records
"Anonymous,": 127 records
1 (100%) merged successfully
1 merged successfully
Publication year (empty): 21 records 18 merged successfully
Title (empty): 1 record 0 merged
Journal (empty): 5 records 1 merged
Pages (empty): 1323 534 merged successfully
Article number NOT empty: 242
AND without Pages: 206
186 merged successfully
155
Reference Type Book: 13
Book Section: 16
Journal Article: 15187
Book: 0 merged
Book Section: 9 merged
Journal Article: 6978

Datasets: zie ook

  • https://osf.io/vgr2p/files/
  • https://osf.io/trwaf/
  • https://osf.io/8ftzw/
  • https://osf.io/fpj54/files/

Deduplication tools

  • https://cran.r-project.org/web/packages/synthesisr/index.html
  • https://onlinelibrary.wiley.com/doi/abs/10.1002/jrsm.1374
  • http://systematicreviewtools.com/tool.php?ref=Systematic%20Review%20Accelerator