Multimodal Deep Learning and Vision-Language Models

Our PhD student Robin Hollifeldt and PostDoc Tianru Zhang are working on multimodal deep learning and vision-language models. We will study how combining language with images/videos can lead to better representation of events around us, making AI models more capable and intelligent.

The project is a part of the Beijer Laboratory for Artificial Intelligence Research, funded by Kjell och Märta Beijer Foundation.

Robin Hollifeldt is working in close collaboration with supervisors Asst. Prof. Ekta Vats and Prof. Thomas Schön.

Tianru Zhang is working is close collaboration with Asst. Prof. Prashant Singh and the Sciml group at UU.

Related Publication: 

  • Li Ju, Max Andersson, Stina Fredriksson, Edward Glöckner, Andreas Hellander, Ekta Vats and Prashant Singh, Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere. Accepted at NeurIPS, 2025. arXiv preprint (NeurIPS poster link)

 

Multispectral Imaging

In collaboration with team MISHA at the Rochester Institute of Technology, Ekta Vats and her group have built a cost-effective MSI system in the lab that will enable research studies on degraded manuscripts. This pre-study was partially supported by the Kjell och Märta Beijer Foundation and our heartfelt gratitude to Prof. Thomas Schön for his encouragement and support. Together with the National Library of Sweden, we aim at potentially leading MSI-based digitisation and research on historical manuscripts in Sweden.

MISHA system in our lab.
My 5-year-old capturing a picture of his favorite book!