Optical Character Recognition and Natural Language Processing

  • Acting out Disease: How Patient Organizations Shaped Modern Medicine.
      • The project involved Optical Character Recognition (OCR), Deep learning, layout analysis, NLP, Named Entity Recognition (NER), BERT, text classification.
      • Jun 2022 – May 2023.
  • Analyses of printed Saami texts from the 18th century.
      • The project involved image quality enhancement, text detection, Optical Character Recognition (OCR) and text mining (NLP).
      • Feb 2023 – Apr 2023.
  • The Labour’s memory project.
      • The project involved digitisation and transcription of the ancient material, machine learning, Optical Character Recognition (OCR), Handwritten Text Recognition (HTR), layout analysis.
      • Aug. – Dec. 2021.
  • Automatic identification of archival paradata using artificial intelligence techniques.
      • The project investigated how AI-based text and image analysis techniques can be used for mining paradata from archival records pertaining to archaeological excavations.
      • YOLO object detector (to identify excavation site objects) and Named Entity Recognition (for paradata).
      • Mar 2022 – Jun 2022.
  • BerryBERT. Text Mining Commodification: The Geography Of the Nordic Lingonberry Rush, 1860-1910.
      • The project focused on BERT-based text classification for Finnish OCR texts to study commodification of wild lingon berries.
      • Dec 2021 – Feb 2022.
  • Exploring Covid-19 and non-Covid vaccine reports from VAERS2020 database using NLP.
      • Explored spaCy Named Entity Recognition (NER) model, trained on a medical corpus that includes 2 entity types: disease and chemical.
      • Data exploration and analysis

  • Communicating Medicine: Digitalisation of Swedish Medical Periodicals, 1781–2011 (SweMPer).
      • This project involves:
          • Optical Character Recognition (OCR),
          • Deep learning based document layout analysis,
          • Object detection – identify and classify objects found in the documents/manuscript,
          • Semantic modelling – investigating BERT models for classifying texts based on the context, performing sentimental analysis, and extracting word meaning.
          • Named Entity Recognition (NER)


                                                                                           Example of layout detection (Image source: SweMPer Project)

Industrial project

Scale Reading Recognition“, CV/OCR project.

Deep learning (PyTorch), YOLO object detection, Attention-based OCR, Scene text recognition benchmark, Transfer learning.