Digital Scriptorium OpenRefine documentation and JSON recipes for data reconciliation
Notes on editing file name variables (use all lowercase letters where applicable):
- DATE = the date the file/dataset was generated/created/extracted in
YYYYMMDDformat - DATATYPE = the type of encoding standard or technical format of the metadata source, such as
marcxmlormetsorcsv - INSTITUTION = the code for the name of the institutional source for the data, such as
pennorkansasorcsl
- Load
DATE-names-DATATYPE-INSTITUTION.csvinto OpenRefine; renameDATE-names-DATATYPE-INSTITUTION-enriched.csv - Add workflow columns: JSON (On the left, go to
Undo/Redo,Applyand paste the JSON code) - Copy
name_as_recordedcolumn and reconcile newrecon-humancolumn against human type (Q5): JSON - Apply list of previously reconciled or known human names: 0. JSON, 1. JSON, 2. JSON
- Manually reconcile and update known human names: edit JSON
- Add
human-label,instance-of-human, andhuman-qidcolumns; rename reconciliation column torecon-organizationto reconcile against organization type (Q43229): JSON - Apply list of previously reconciled or known organization names: JSON
- Manually reconcile and update known organization names: edit JSON
- Add
organization-label,instance-of-organization, andorganization-qidcolumns; consolidateauthorized_label,instance_of, andstructured_valuecolumns; finalize workflow: JSON - Do not forget to close all facets
- Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by
structured_valueblank (null/empty) =trueand rename itDATE-names-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet bystructured_valueblank (null/empty) =falseand rename itDATE-names-DATATYPE-INSTITUTION-reconciled.csv
json/name/010-name-workflow.json
json/name/030-name-recon-human.json
json/name/040-name-known-human.json
json/name/041-name-known-human.json
json/name/042-name-known-human.json
json/name/050-name-recon-org.json
json/name/060-name-known-org.json
json/name/090-name-finalize.json
- Load
DATE-genres-DATATYPE-INSTITUTION.csvinto OpenRefine; renameDATE-genres-DATATYPE-INSTITUTION-enriched.csv - Add workflow columns: JSON
- Copy filtered
genre_as_recordedcolumn and reconcile newrecon-genrecolumn against AAT vocabulary: JSON - Apply list of previously reconciled or known AAT terms: JSON
- Manually reconcile and update known AAT genre terms: edit JSON
- Add
aat-labelandgenre-aatcolumns: JSON
- Copy filtered
genre_as_recordedcolumn and reconcile newrecon-genrecolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST genre terms: edit JSON
- Add
fast-labelandgenre-fastcolumns: JSON
- Finalize workflow; consolidate
authorized_labelandstructured_valuecolumns: JSON - Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by
structured_valueblank (null/empty) =trueand rename itDATE-genres-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet bystructured_valueblank (null/empty) =falseand rename itDATE-genres-DATATYPE-INSTITUTION-reconciled.csv
json/genre/010-genre-workflow.json
json/genre/aat/030-genre-aat-recon.json
json/genre/aat/040-genre-aat-known.json
json/genre/aat/050-genre-aat.json
json/genre/fast/030-genre-fast-recon.json
json/genre/fast/040-genre-fast-known.json
json/genre/fast/050-genre-fast.json
json/genre/090-genre-finalize.json
- Load
DATE-named-subjects-combined.csvinto OpenRefine; renameDATE-named-subjects-combined-enriched.csv - Add workflow columns: JSON
- Copy
subject_as_recordedcolumn and reconcile newrecon-subjectcolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
named-subject-label-1andnamed-subject-fast-1columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
named-subject-label-2andnamed-subject-fast-2columns; consolidateauthorized_labelandstructured_valuecolumns; finalize workflow: JSON - Export three versions from OpenRefine as CSV files: 1) full document, 2) facet by
structured_valueblank (null/empty) =true, 3) facet bystructured_valueblank (null/empty) =false
- Load
DATE-subjects-combined.csvinto OpenRefine; renameDATE-subjects-combined-enriched.csv - Add workflow columns: JSON
- Copy
subject_as_recordedcolumn and reconcile newrecon-subjectcolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-1andsubject-fast-1columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-2andsubject-fast-2columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-3andsubject-fast-3columns; consolidateauthorized_labelandstructured_valuecolumns; finalize workflow: JSON - Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by
structured_valueblank (null/empty) =trueand rename itDATE-subjects-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet bystructured_valueblank (null/empty) =falseand rename itDATE-subjects-DATATYPE-INSTITUTION-reconciled.csv
json/subject/010-subject-workflow.json
json/subject/named/030-named-subject-recon-1.json
json/subject/named/040-named-subject-known.json
json/subject/named/060-named-subject-recon-2.json
json/subject/named/090-named-subject-finalize.json
json/subject/topic/030-subject-recon-1.json
json/subject/topic/040-subject-known.json
json/subject/topic/060-subject-recon-2.json
json/subject/topic/090-subject-recon-3.json
json/subject/topic/120-subject-finalize.json
Language reconciliation instructions