Digital Scriptorium OpenRefine documentation and JSON recipes for data reconciliation
When utilizing the JSON instructions (also known as recipes) found in this repository for DS data in OpenRefine, find the left column, select the Undo/Redo tab, select Apply, paste the JSON code, and then select Perform operations. This will execute the prewritten commands which perform various actions on the data for the reconciliation process.
Facets and filters can also be used on the data by using drop-down menus available on each column header and displayed in the left column when selecting the Facet/Filter tab.
The following notes apply to file naming conventions for editing file name variables found in the instructions in this repository (use all lowercase letters where applicable):
- DATE = the date the file/dataset was generated/created/extracted in
YYYYMMDDformat - INSTITUTION = the code for the name of the institutional source for the data, such as
pennorkansasorcsl - DATATYPE = the type of encoding standard or technical format of the metadata source, such as
marcxmlormetsorcsv - One or more DIFFERENTIATORS may also be added on the file name to disambiguate files, using sources names of collections or databases, such as
bibliophilly, or batch numbers, such asbatch-1,batch-2, etc.
Examples of correctly formatted file names:
20230518-materials-rome-mets-legacy-enriched.csv20230630-genres-penn-marcxml-bibliophilly-enriched.csv20230715-names-kansas-marc-enriched.csv20230816-languages-princeton-marcxml-batch-3-enriched.csv20230901-places-hrc-csv-fragments-batch-1-enriched.csv
Language reconciliation instructions
Material reconciliation instructions
Place reconciliation instructions
- Load
DATE-names-INSTITUTION-DATATYPE.csvinto OpenRefine; renameDATE-names-INSTITUTION-DATATYPE-enriched.csv - Add workflow columns: JSON (On the left, go to
Undo/Redo,Applyand paste the JSON code, andPerform operations) - Copy
name_as_recordedcolumn and reconcile newrecon-humancolumn against human type (Q5): JSON - Apply list of previously reconciled or known human names: 0. JSON, 1. JSON, 2. JSON
- Manually reconcile and update known human names: edit JSON and submit pull request
- Add
human-label,instance-of-human, andhuman-qidcolumns; rename reconciliation column torecon-organizationto reconcile against organization type (Q43229): JSON - Apply list of previously reconciled or known organization names: JSON
- Manually reconcile and update known organization names: edit JSON
- Add
organization-label,instance-of-organization, andorganization-qidcolumns; addinstance_of_add,authorized_label_add, andstructured_value_addcolumns; finalize workflow: JSON - Remove any facets and filters
- Export full CSV from OpenRefine (retain file name)
json/name/010-name-workflow.json
json/name/030-name-recon-human.json
json/name/040-name-known-human.json
json/name/041-name-known-human.json
json/name/042-name-known-human.json
json/name/050-name-recon-org.json
json/name/060-name-known-org.json
json/name/090-name-finalize.json
- Load
DATE-genres-DATATYPE-INSTITUTION.csvinto OpenRefine; renameDATE-genres-DATATYPE-INSTITUTION-enriched.csv - Add workflow columns: JSON
- Copy filtered
genre_as_recordedcolumn and reconcile newrecon-genrecolumn against AAT vocabulary: JSON - Apply list of previously reconciled or known AAT terms: JSON
- Manually reconcile and update known AAT genre terms: edit JSON
- Add
aat-labelandgenre-aatcolumns: JSON
- Copy filtered
genre_as_recordedcolumn and reconcile newrecon-genrecolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST genre terms: edit JSON
- Add
fast-labelandgenre-fastcolumns: JSON
- Finalize workflow; consolidate
authorized_labelandstructured_valuecolumns: JSON - Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by
structured_valueblank (null/empty) =trueand rename itDATE-genres-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet bystructured_valueblank (null/empty) =falseand rename itDATE-genres-DATATYPE-INSTITUTION-reconciled.csv
json/genre/010-genre-workflow.json
json/genre/aat/030-genre-aat-recon.json
json/genre/aat/040-genre-aat-known.json
json/genre/aat/050-genre-aat.json
json/genre/fast/030-genre-fast-recon.json
json/genre/fast/040-genre-fast-known.json
json/genre/fast/050-genre-fast.json
json/genre/090-genre-finalize.json
- Load
DATE-named-subjects-combined.csvinto OpenRefine; renameDATE-named-subjects-combined-enriched.csv - Add workflow columns: JSON
- Copy
subject_as_recordedcolumn and reconcile newrecon-subjectcolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
named-subject-label-1andnamed-subject-fast-1columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
named-subject-label-2andnamed-subject-fast-2columns; consolidateauthorized_labelandstructured_valuecolumns; finalize workflow: JSON - Export three versions from OpenRefine as CSV files: 1) full document, 2) facet by
structured_valueblank (null/empty) =true, 3) facet bystructured_valueblank (null/empty) =false
- Load
DATE-subjects-combined.csvinto OpenRefine; renameDATE-subjects-combined-enriched.csv - Add workflow columns: JSON
- Copy
subject_as_recordedcolumn and reconcile newrecon-subjectcolumn against FAST terms: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-1andsubject-fast-1columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-2andsubject-fast-2columns, reconcile nextrecon-subjectcolumn: JSON - Apply list of previously reconciled or known FAST terms: JSON
- Manually reconcile and update known FAST terms: edit JSON
- Add
subject-label-3andsubject-fast-3columns; consolidateauthorized_labelandstructured_valuecolumns; finalize workflow: JSON - Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by
structured_valueblank (null/empty) =trueand rename itDATE-subjects-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet bystructured_valueblank (null/empty) =falseand rename itDATE-subjects-DATATYPE-INSTITUTION-reconciled.csv
json/subject/010-subject-workflow.json
json/subject/named/030-named-subject-recon-1.json
json/subject/named/040-named-subject-known.json
json/subject/named/060-named-subject-recon-2.json
json/subject/named/090-named-subject-finalize.json
json/subject/topic/030-subject-recon-1.json
json/subject/topic/040-subject-known.json
json/subject/topic/060-subject-recon-2.json
json/subject/topic/090-subject-recon-3.json
json/subject/topic/120-subject-finalize.json