Findings

Example of a pre and post process tagging visualization of a recording, with the post process being dramatically more dense — Pre and Post Process Tagging Visualization

One of my main concerns during the testing phase was whether transcribers would find the Apps Script coding element confusing or anxiety-inducing. This was not the case. By breaking down and repairing elements of the code during our weekly meetings, I was able to demystify the process and explain the purpose of each component. Additionally, I received excellent feedback from transcribers based on their experience, which led to many improved iterations.

From what I can gather, this approach has generated a significant increase in both the volume, accuracy and detail of oral history transcription work. In addition to the tools discussed in this article, other factors contributing to this progress may include:

A more dynamic, interactive workflow leading to greater transcriber productivity
Less repetitive labeling and formatting work
Supplementary documentation helping transcribers navigate the more technical aspects of the workflow

In addition to helping us meet our department’s accessibility standards, this process enabled us to complete our first non-English oral history collection in the form of the Hispanic Oral History Project, an initiative from 1991 copyedited by student worker Daniel Olortegui Vargas. This work in progress will be using this material to enhance the OHD item-level interface, allowing listeners to toggle between English and Non-English transcriptions. This new feature in the open-source OHD framework aims to promote the digitization of more diverse oral history collections both within and beyond the institution.

Regarding the limitations of data-driven, human-edited automated tagging, program managers must communicate that automated tags are only a starting point. Tags may be incorrectly applied, missing or need to be applied more broadly to transcripts. Even when these measures are taken, the amount of detail this process accrues is drastic and easily distinguishable in OHD’s tagging visualization in the image above. One could argue that the density of the data now makes it difficult for the researcher to navigate, especially on mobile devices and this continues to be part of the conversation as we refine these processes.

✺

Conclusion

In discussing grant funding for digital initiatives, a colleague pointed out that the time-intensive nature of oral history projects often leads to their neglect. As they put it:

“Would you rather present ten oral histories or 500 photographs?”

This quantity focused selection criteria ultimately results in an existential threat, resulting in the physical vulnerability of audio materials as they languish in the archives. Bicentennial and community oral history initiatives, rich in non-academic perspective, offer a uniquely biographical account of places and provide valuable contrast and context to the accepted historical record. By utilizing machine learning, Python, and JavaScript approaches, this process aims to make digitizing these resources more efficient and accessible, promoting their preservation and availability to the public.