Optical Character Recognition and Natural Language Processing
- Acting out Disease: How Patient Organizations Shaped Modern Medicine.
- The project involved Optical Character Recognition (OCR), Deep learning, layout analysis, NLP, Named Entity Recognition (NER), BERT, text classification.
- Jun 2022 – May 2023.
- Analyses of printed Saami texts from the 18th century.
- The project involved image quality enhancement, text detection, Optical Character Recognition (OCR) and text mining (NLP).
- Feb 2023 – Apr 2023.
- The Labour’s memory project.
- The project involved digitisation and transcription of the ancient material, machine learning, Optical Character Recognition (OCR), Handwritten Text Recognition (HTR), layout analysis.
- Aug. – Dec. 2021.
- Automatic identification of archival paradata using artificial intelligence techniques.
- The project investigated how AI-based text and image analysis techniques can be used for mining paradata from archival records pertaining to archaeological excavations.
- YOLO object detector (to identify excavation site objects) and Named Entity Recognition (for paradata).
- Mar 2022 – Jun 2022.
- BerryBERT. Text Mining Commodification: The Geography Of the Nordic Lingonberry Rush, 1860-1910.
- The project focused on BERT-based text classification for Finnish OCR texts to study commodification of wild lingon berries.
- Dec 2021 – Feb 2022.
- Exploring Covid-19 and non-Covid vaccine reports from VAERS2020 database using NLP.
- Explored spaCy Named Entity Recognition (NER) model, trained on a medical corpus that includes 2 entity types: disease and chemical.
- Data exploration and analysis
- Communicating Medicine: Digitalisation of Swedish Medical Periodicals, 1781–2011 (SweMPer).
- This project involves:
- Optical Character Recognition (OCR),
- Deep learning based document layout analysis,
- Object detection – identify and classify objects found in the documents/manuscript,
- Semantic modelling – investigating BERT models for classifying texts based on the context, performing sentimental analysis, and extracting word meaning.
- Named Entity Recognition (NER)
- This project involves:
Example of layout detection (Image source: SweMPer Project)
Industrial project
“Scale Reading Recognition“, CV/OCR project.
Deep learning (PyTorch), YOLO object detection, Attention-based OCR, Scene text recognition benchmark, Transfer learning.