Installing and Importing
Install OpenRefine
b. Install and launch OpenRefine following the instructions provided on the site according to your device.
Note: When you launch OpenRefine, it will run in your default web browser but it is actually running locally on your machine, similar to running a Git site in development locally.
✺
Load Your CSV File
a. Select “Create Project” on the OpenRefine homepage
b. Select “Choose Files” and select your CSV file from your computer
c. Select “Next” to load the data. OpenRefine will display a preview of the data for you to verify that it looks correct
d. In the next intermediary screen, adjust character encoding to UTF-8
, set column separation to CSV, have the Parse next 1 line as headers
selected and adjust the project name as needed. The Store Blank Rows
, Store Blank Cells as Nulls
and Use Character " to Enclose Cells Containing Column Separators
should be selected by default. These simply ensure data integrity by not erasing null rows and cells and not separating commas within cells, such as Smith, John
e. Then, select “Create Project”
✺
Standardization Example
To use a practical example for this tutorial, I’ll be using metadata for digital collection of different scholarly publications that needs standardizing. Actions needed:
- In the
creator
field, author names need to be standardized and then these names need “last name, first name (or first initial)” formatting - In the
pubtype
field, “Master Gardener” needs to change to “Master Gardener Program Handbook” and “Pacific Northwest Extension” changes to “Pacific Northwest Extension Publications” Subjects
field needs to be standardized and verified as Getty standards (CDIL follows Getty, SPECS follows LCHS… might want to sort that out at some point)Publisher
field needs basic standardization