Name Formatting
Remove dates and Unwanted Characters from Name Fields
One item that came up in this example was that some items had birth and death ranges that we needed to remove. While you can find and replace specific punctuation in Google Sheets, random sets of yyyy date ranges is a little trickier. Using OpenRefine:
- Select the column with the author names
- Select the dropdown menu >
Edit Cells
>Transform
- Enter the following GREL expression:
value.replace(/,\s?\d{4}-?/, "")
Explanation of the Formula
value
: Refers to the content of each cell in the column..replace(...)
: Applies a regular expression to find and replace the birth date portion./,\s?\d{4}-?/
: This regular expression matches:,
: A comma\s?
: An optional space\d{4}
: Exactly four digits (representing the year)-?
: An optional hyphen (to account for cases where the year might or might not be followed by a dash)
This formula will remove any pattern like , 1957-
or ,1957
from the names, leaving just the author’s name without the birth date.
✺
Standardizing Author Names to “Last Name, First Name” Format
While the names in the example metadata we are using were already formatted correctly, the following steps will show you how to transform names from “First Last” to “Last, First” format, using GREL to split and reorder the names.
a. Select Dropdown of Author Column
b. “Edit cells” > “Transform…”
c. Enter the Transformation Expression:
```
if(value.split(" ").length() == 3, value.split(" ")[2] + " " + value.split(" ")[1] + ", " + value.split(" ")[0], value.split(" ")[1] + ", " + value.split(" ")[0] )
```
- Click “OK” to apply the transformation.
Explanation of the Formula
value.split(" ").length() == 3
: Checks if there are three parts in the name.- For Three-Part Names: If the name has three parts (e.g.,
"first name middle initial last name"
), it reorders to"last name middle initial, first name"
. - For Two-Part Names: If there are only two parts (e.g.,
"first name last name"
), it reorders to"last name, first name"
.