Metadata Clean-up

Both of these projects had been dormant for a few years before I started to look at combining them. The fifteen videos that made up the oral history element needed to be transcribed and the metadata for the archive elements had major issues. If ‘data’ are ‘pieces of information’ metadata, in this context ‘are sets of data which describe that information,’ in this case, literally just fields in a spreadsheet, which is how CollectionBuilder builds a site. No digital scholarship work can be started until the data that we are working with has been “cleaned” and formatted correctly.

Some of this work involved “standardizing” language. For instance, a particular researcher might be labeled as Jim Peele, Jim Peale and Jim P. in the original spreadsheet and all of these instances need to be called the same thing in order for that data to be useful. One of the best ways I’ve found to identify these slight naming variations is using a tool called OpenRefine, which I won’t go into depth with here but I would recommend my colleague Evan Williamson’s digital workshop on the platform found here.

Other work involved surveying the collection for duplicates, where the same document was accidentally digitized twice or even three times because the accession was never formally processed by an archivist, who would have caught these redundancies along with sensitive information, like applications and resumes that contained student social security numbers and addresses of applicants to the station.