A relational database was used for validation.
The RDBM used was MS Access (MS Office 2016):
The table for a validation set contains the fields:
Field | Type | Default | Content |
---|---|---|---|
id | INTEGER | The original ID in the EndNote DB. PRIMARY KEY | |
dedupid | INTEGER | NULL | Content of the Label field in Mark mode, i.e. the ID of the first record in a duplicate set |
correction | INTEGER | NULL | Manually set for the False Positive (FP) and False Negative (FN) results (see below) |
validated | BOOLEAN | FALSE | Manually set to TRUE if the DedupEndNote result is validated |
tp | BOOLEAN | FALSE | Manually set to TRUE if record is indeed a duplicate of the record with DedupID |
tn | BOOLEAN | FALSE | Manually set to TRUE if record has no duplicates |
fp | BOOLEAN | FALSE | Manually set to TRUE if DedupEndNote has wrongly identified the record as a duplicate of record with DedupID. If the record has no duplicates, Correction contains the ID, otherwise the ID of the true duplicate. |
fn | BOOLEAN | FALSE | Manually set to TRUE if DedupEndNote has not identified the record as a duplicate. The ID of the missed duplicate is stored in Correction. If the record is a False Positive but also has duplicates, it is only marked as False Positive: otherwise TP + TN + FP + FN would be greater than the size of the validation set. |
unsolvable | BOOLEAN | FALSE | ??? |
authors_truncated | TEXT |
Authors joined with '; ', truncated at 254 characters In an MS Access DB: SHORT TEXT (i.e. max. 255 characters), to make the field sortable and searchable. |
|
authors | TEXT |
Authors joined with '; ' In an MS Access DB: LONG TEXT (a.k.a. MEMO), not sortable or searchable. |
|
publ_year | TEXT | Publication Year | |
title_truncated | TEXT |
Title, truncated at 254 characters In an MS Access DB: SHORT TEXT (i.e. max. 255 characters), to make the field sortable and searchable. |
|
title | TEXT |
Title In an MS Access DB: LONG TEXT (a.k.a. MEMO), not sortable or searchable. |
|
title2 | TEXT | Journal Title / Book Title | |
volume | TEXT | Volume | |
issue | TEXT | Issue | |
pages | TEXT | Starting Page | |
article_number | TEXT | Article Number | |
dois | TEXT | DOIs joined with '; ' | |
publ_type | TEXT | Type of publication. 'type' is a SQL reserved word | |
database | TEXT | Database Provider | |
number_authors | INTEGER | Number of authors |
java -Dlogging.level.edu.dedupendnote.services.IOService=DEBUG -jar DedupEndNote-0.9.5-SNAPSHOT.jarIf everythings works, the log should end with ""Records read: ". If not,log will show what the last record successfully read in was
java -Dlogging.level.edu.dedupendnote.services.IOService=DEBUG -Dlogging.level.edu.dedupendnote.domain.Record=DEBUG -jar DedupEndNote-0.9.5-SNAPSHOT.jar
java -Dlogging.level.edu.dedupendnote.services=DEBUG -jar DedupEndNote-0.9.5-SNAPSHOT.jar