Research Aim

Build innovative data‑driven AI solutions using expertise in Text recognition (OCR, HTR), Document Image analysis, Computer Vision, AI and Machine Learning. My background lies in both traditional image processing/computer vision methods (Hidden Markov Models, keypoint-based word spotting, etc.) and also the current state-of-the-art Recurrent Neural Network (RNN) and attention-based deep learning approaches.

Research Themes

AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks

This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases.

AttentionHTR is simple, modular, and reproducible, more data can be easily added in the pipeline, further strengthening the model’s accuracy.

Source code, demo and pre-trained models are available at GitHub.

       AttentionHTR poster presented at the 15th IAPR International Workshop on Document Analysis Systems (DAS).


Marginalia and Machine Learning

AttentionHTR is also being studied further in an ongoing project Marginalia and Machine Learning, funded by the Matariki Network Initiation Grant, that focuses on automatic detection and recognition of handwritten marginalia from a printed book collection. Faster RCNN is used for detection of marginalia and AttentionHTR is used for word recognition.

                                              Image source: Uppsala University Library, Alvin portal.


Optical Character Recognition and Natural Language Processing

I have several ongoing projects that involve OCR and NLP, and are listed as follows:

  • Acting out Disease: How Patient Organizations Shaped Modern Medicine.
      • The project involves Optical Character Recognition (OCR), Deep learning, layout analysis, NLP, Named Entity Recognition (NER), BERT, text classification.
  • Analyses of printed Saami texts from the 18th century.
      • The project involves image quality enhancement, text detection, Optical Character Recognition (OCR) and text mining (NLP).
      • Ongoing
  • The Labour’s memory project.
      • The project involves digitisation and transcription of the ancient material, machine learning, Optical Character Recognition (OCR), Handwritten Text Recognition (HTR), layout analysis.
      • Aug. – Dec. 2021.
  • Automatic identification of archival paradata using artificial intelligence techniques.
      • The project investigates how AI-based text and image analysis techniques can be used for mining paradata from archival records pertaining to archaeological excavations.
      • YOLO object detector (to identify excavation site objects) and Named Entity Recognition (for paradata).
  • BerryBERT. Text Mining Commodification: The Geography Of the Nordic Lingonberry Rush, 1860-1910.
      • The project focuses on BERT-based text classification for Finnish OCR texts to study commodification of wild lingon berries.
  • Exploring Covid-19 and non-Covid vaccine reports from VAERS2020 database using NLP.
      • Explores spaCy Named Entity Recognition (NER) model, trained on a medical corpus that includes 2 entity types: disease and chemical.
      • Data exploration and analysis

  • Communicating Medicine: Digitalisation of Swedish Medical Periodicals, 1781–2011 (SweMPer).
      • This project involves:
          • Optical Character Recognition (OCR),
          • Deep learning based document layout analysis,
          • Object detection – identify and classify objects found in the documents/manuscript,
          • Semantic modelling – investigating BERT models for classifying texts based on the context, performing sentimental analysis, and extracting word meaning.
          • Named Entity Recognition (NER)

                                                       Example of layout detection (Image source: SweMPer Project)

Early detection of human actions

Early detection of human actions is essential in a wide spectrum of applications ranging from video surveillance to health-care. While human action recognition has been extensively studied, little attention is paid to the problem of detecting ongoing human action early, i.e. detecting an action as soon as it begins, but before it finishes. This study aims at training a detector to be capable of recognizing a human action when only partial action sample is seen. To do so, a hybrid technique is proposed in this work which combines the benefits of computer vision as well as fuzzy set theory based on the fuzzy Bandler and Kohout’s sub-triangle product (BK subproduct). The novelty lies in the construction of a frame-by-frame membership function for each kind of possible movement. Detection is triggered when a pre-defined threshold is reached in a suitable way. Experimental results on a publicly available dataset demonstrate the benefits and effectiveness of the proposed method.

Industrial Projects

  • Scale Reading Recognition“, CV/OCR project, Silo AI, 2021.
        • Deep learning (PyTorch), YOLO object detection, Attention-based OCR, Scene text recognition benchmark, Transfer learning. 
  • Veneer project“, Silo AI, 2021.
        • Video analysis, camera calibration, data acquisition, implementation of image processing algorithms for width measurement.