Reconciliation and Exporting


Reconciliation

OpenRefine has a functionality that should allow user’s to reconcile data with external databases by matching names, subjects or entities to unique identifiers in those databases, standardizing language to specific schemas such as Getty Art & Architecture Thesaurus or Library of Congress Subject Headings.

However, there are complications due to Application Programming Interface (API) compatibility. An API is a set of rules and protocols that allow different software applications to communicate with each other. It acts as a bridge, enabling one application to access the functionality or data of another application in a structured way. Because the reconciliation process engages in a transfer of data, connection failures caused by protection settings can occur, especially with our security parameters as an academic institution.

For example, this is what I encountered trying to reconcile with Getty’s databases and Library of Congress doesn’t even have a listed service in OpenRefine, likely due to a lack of standardized API endpoints needed for this type of integration.

Gif of the OpenRefine Interface Failing to Connect with the Getty Vocabulary Database
Failing to Connect with the Getty Vocabulary Database

That said, I was able to successfully reconcile with other databases that may be useful for standardization, including Geonames for verifying location names, ORCID iD for bibliometric data and Open Library for general subject terms.

Gif of the OpenRefine Interface Successfully Connecting with the Geonames Vocabulary Database
Successfully Connecting with the Geonames Vocabulary Database

To reconcile data:

  • Select the column that you would like to reconcile
  • Select the dropdown Reconcile > Start Reconciling
  • Add Standard Service and enter the http address from the Reconciliation Service Test Bench
  • This will take you to an intermediary screen where you can select which areas of the database you would like to use, select the level of filtering and maximum number of “candidates” to return for each entity
  • After a good deal of processing, the candidates will generate in each of the cells and you have the option to select these options for either just the one entity or to apply them to all of the subjects in that column.
Gif of the OpenRefine Interface Displaying Output of Reconciled Data, Displaying Multiple Candidates Next to Checkbox Options to Adopt Words Either Individually or Across the Field
Output of Reconciled Data, Displaying Multiple Candidates Next to Checkbox Options to Adopt Words Either Individually or Across the Field


Export the Standardized Data

FINALLY

After standardizing, export your project by going to Export > Comma-separated value (CSV)

Gif of the OpenRefine Interface Exporting CSV
Exporting CSV