Name	Name	Last commit message	Last commit date
Latest commit History 918 Commits
instructions	instructions
json	json
.gitignore	.gitignore
FAST_reconciliation_setup.md	FAST_reconciliation_setup.md
README.md	README.md

Digital Scriptorium Data Reconciliation Process through OpenRefine

Digital Scriptorium OpenRefine documentation and JSON recipes for data reconciliation

General instructions

When utilizing the JSON instructions (also known as recipes) found in this repository for DS data in OpenRefine, find the left column, select the Undo/Redo tab, select Apply, paste the JSON code, and then select Perform operations. This will execute the prewritten commands which perform various actions on the data for the reconciliation process.

Facets and filters can also be used on the data by using drop-down menus available on each column header and displayed in the left column when selecting the Facet/Filter tab.

The following notes apply to file naming conventions for editing file name variables found in the instructions in this repository (use all lowercase letters where applicable):

DATE = the date the file/dataset was generated/created/extracted in YYYYMMDD format
INSTITUTION = the code for the name of the institutional source for the data, such as penn or kansas or csl
DATATYPE = the type of encoding standard or technical format of the metadata source, such as marcxml or mets or csv
One or more DIFFERENTIATORS may also be added on the file name to disambiguate files, using sources names of collections or databases, such as bibliophilly, or batch numbers, such as batch-1, batch-2, etc.

Examples of correctly formatted file names:

20230518-materials-rome-mets-legacy-enriched.csv
20230630-genres-penn-marcxml-bibliophilly-enriched.csv
20230715-names-kansas-marc-enriched.csv
20230816-languages-princeton-marcxml-batch-3-enriched.csv
20230901-places-hrc-csv-fragments-batch-1-enriched.csv

Reconciliation instructions by metadata element / authority type

Reconciling names to Wikidata

Load DATE-names-INSTITUTION-DATATYPE.csv into OpenRefine; rename DATE-names-INSTITUTION-DATATYPE-enriched.csv
Add workflow columns: JSON (On the left, go to Undo/Redo, Apply and paste the JSON code, and Perform operations)
Copy name_as_recorded column and reconcile new recon-human column against human type (Q5): JSON
Apply list of previously reconciled or known human names: 0. JSON, 1. JSON, 2. JSON
Manually reconcile and update known human names: edit JSON and submit pull request
Add human-label, instance-of-human, and human-qid columns; rename reconciliation column to recon-organization to reconcile against organization type (Q43229): JSON
Apply list of previously reconciled or known organization names: JSON
Manually reconcile and update known organization names: edit JSON
Add organization-label, instance-of-organization, and organization-qid columns; add instance_of_add, authorized_label_add, and structured_value_add columns; finalize workflow: JSON
Remove any facets and filters
Export full CSV from OpenRefine (retain file name)

json/name/010-name-workflow.json
json/name/030-name-recon-human.json
json/name/040-name-known-human.json
json/name/041-name-known-human.json
json/name/042-name-known-human.json
json/name/050-name-recon-org.json
json/name/060-name-known-org.json
json/name/090-name-finalize.json

Reconciling genres

Load DATE-genres-DATATYPE-INSTITUTION.csv into OpenRefine; rename DATE-genres-DATATYPE-INSTITUTION-enriched.csv
Add workflow columns: JSON

to AAT

Copy filtered genre_as_recorded column and reconcile new recon-genre column against AAT vocabulary: JSON
Apply list of previously reconciled or known AAT terms: JSON
Manually reconcile and update known AAT genre terms: edit JSON
Add aat-label and genre-aat columns: JSON

to FAST

Copy filtered genre_as_recorded column and reconcile new recon-genre column against FAST terms: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST genre terms: edit JSON
Add fast-label and genre-fast columns: JSON

TBD instructions for other genres as needed

all genre terms: finalize

Finalize workflow; consolidate authorized_label and structured_value columns: JSON
Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by structured_value blank (null/empty) = true and rename it DATE-genres-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet by structured_value blank (null/empty) = false and rename it DATE-genres-DATATYPE-INSTITUTION-reconciled.csv

json/genre/010-genre-workflow.json
json/genre/aat/030-genre-aat-recon.json
json/genre/aat/040-genre-aat-known.json
json/genre/aat/050-genre-aat.json
json/genre/fast/030-genre-fast-recon.json
json/genre/fast/040-genre-fast-known.json
json/genre/fast/050-genre-fast.json
json/genre/090-genre-finalize.json

Reconciling subjects to FAST

Reconciling named subjects

Load DATE-named-subjects-combined.csv into OpenRefine; rename DATE-named-subjects-combined-enriched.csv
Add workflow columns: JSON
Copy subject_as_recorded column and reconcile new recon-subject column against FAST terms: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST terms: edit JSON
Add named-subject-label-1 and named-subject-fast-1 columns, reconcile next recon-subject column: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST terms: edit JSON
Add named-subject-label-2 and named-subject-fast-2 columns; consolidate authorized_label and structured_value columns; finalize workflow: JSON
Export three versions from OpenRefine as CSV files: 1) full document, 2) facet by structured_value blank (null/empty) = true, 3) facet by structured_value blank (null/empty) = false

Reconciling subjects (topical, etc.)

Load DATE-subjects-combined.csv into OpenRefine; rename DATE-subjects-combined-enriched.csv
Add workflow columns: JSON
Copy subject_as_recorded column and reconcile new recon-subject column against FAST terms: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST terms: edit JSON
Add subject-label-1 and subject-fast-1 columns, reconcile next recon-subject column: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST terms: edit JSON
Add subject-label-2 and subject-fast-2 columns, reconcile next recon-subject column: JSON
Apply list of previously reconciled or known FAST terms: JSON
Manually reconcile and update known FAST terms: edit JSON
Add subject-label-3 and subject-fast-3 columns; consolidate authorized_label and structured_value columns; finalize workflow: JSON
Export three versions from OpenRefine as CSV files: 1) full document (retain file name), 2) facet by structured_value blank (null/empty) = true and rename it DATE-subjects-DATATYPE-INSTITUTION-unreconciled.csv, 3) facet by structured_value blank (null/empty) = false and rename it DATE-subjects-DATATYPE-INSTITUTION-reconciled.csv

json/subject/010-subject-workflow.json
json/subject/named/030-named-subject-recon-1.json
json/subject/named/040-named-subject-known.json
json/subject/named/060-named-subject-recon-2.json
json/subject/named/090-named-subject-finalize.json
json/subject/topic/030-subject-recon-1.json
json/subject/topic/040-subject-known.json
json/subject/topic/060-subject-recon-2.json
json/subject/topic/090-subject-recon-3.json
json/subject/topic/120-subject-finalize.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Digital Scriptorium Data Reconciliation Process through OpenRefine

General instructions

Reconciliation instructions by metadata element / authority type

Languages

Materials

Places

Reconciling names to Wikidata

Reconciling genres

to AAT

to FAST

TBD instructions for other genres as needed

all genre terms: finalize

Reconciling subjects to FAST

Reconciling named subjects

Reconciling subjects (topical, etc.)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Digital Scriptorium Data Reconciliation Process through OpenRefine

General instructions

Reconciliation instructions by metadata element / authority type

Languages

Materials

Places

Reconciling names to Wikidata

Reconciling genres

to AAT

to FAST

TBD instructions for other genres as needed

all genre terms: finalize

Reconciling subjects to FAST

Reconciling named subjects

Reconciling subjects (topical, etc.)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages