Uncovering the Handwritten Text in the Margins
This work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia, and leverages data augmentation and transfer learning to overcome training data scarcity. The detection phase involves investigation of R-CNN and Faster R-CNN networks. The recognition phase includes an attention-based sequence-to-sequence model, with ResNet feature extraction, bidirectional LSTM-based sequence modeling, and attention-based prediction of marginalia. The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden.
Liang Cheng, Jonas Frankemölle, Adam Axelsson and Ekta Vats, Uncovering the Handwritten Text in the Margins: End-to-end Handwritten Text Detection and Recognition. In the Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, co-located with the 18th Conference of the European Chapter of the Association for Computational Linguistic (EACL 2024). Paper
Image source: Uppsala University Library, Alvin portal.
Why marginalia is important?
- Preservation and access: The Marginalia, notes, and annotations found in historical documents, manuscripts, and books provide valuable insights into the thoughts, reactions, and interpretations of past readers.
- Textual scholarship: Marginalia often include corrections, annotations, and commentary that shed light on variant readings, textual discrepancies, and the transmission history of texts.
- Understanding reader reception: By analyzing the content, language, and placement of marginalia, researchers can better understand how texts were received, interpreted, and appropriated by readers over time. This can provide valuable insights into changing literary tastes, cultural norms, and intellectual movements.
- Interdisciplinary research: Marginalia can contribute towards Interdisciplinary Research in the area where Marginalia data can be of interest to scholars working in diverse fields such as literary studies, history, sociology, linguistics, and cognitive science.