Prepping Materials

To use this annotation workflow you will need a digital version of your text material in PDF file format with a layer of OCR or Optical Character Recognition.

Visualization of a digital book's OCR layer
Visualization of a digital book's OCR layer

The optical character recognition process involves scanning pages, segmenting different areas of text (like recognizing all of the different columns in a newspaper, for instance), extracting features and finally recognizing individual characters. This interpretation of the text is then overlaid onto the image or document where it can now be keyword searchable and used with text to speech assistive technology. If you are using downloadable ebooks or request a digitization physical books through our interlibrary loan these items will come with a layer of OCR.

If you need to generate a layer of OCR yourself, you can do this with Adobe Acrobat, which you have a subscription to through the university. For alternative approaches, see Notes section. While the OCR function of Adobe is adequate, you may encounter errors of OCR interpretting a two page spread as a single column.



Demo of Book Splitter Python Tool
Demo of Book Splitter Python Tool

With this in mind, I built a Book Splitter Python tool which will convert all of your pages into left and right sides and reorder these pages in the output. Instructions for how to set-up the tool are included in the Appendix but are very similar to the set-up for the Annotation Extraction tool we will walk through in a moment.


Now that you have an OCR layer, you will need to read and highlight your material. This can either be done either in Adobe Acrobat or a freely available option that I like called Okular, where you can easily build out an unlimited number of custom highlights sets. To change your default word processor in Zotero to your preferred platform select Zotero>General>Open PDFs Using and select the software.