Explainable and efficient large multimodal models for downstream video recognition tasks
Despite the remarkable advancements of deep learning methodologies over the past decade, in-depth video understanding remains a challenge. eXeLMM will address this challenge by harnessing the power of Large Language Models (LLMs) / Large Multimodal Models (LMMs). Powerful such open models will be selected, adapted and employed for achieving in-depth video understanding. As part of this work, eXeLMM will make considerable advancements in relation to critical issues that arise when attempting to exploit LLMs/LMMs in video tasks, such as computational efficiency and explainability.
Dr. Vasileios Mezaris, Research Director (Researcher A), eXeLMM Principal Investigator
Head of the Intelligent Digital Transformation (IDT) Laboratory of ITI-CERTH
PhD Electrical and Computer Engineering, BSc Electrical and Computer Engineering
Dr. Evlampios Apostolidis, Postdoctoral Researcher
PhD Electronic Engineering, MSc Information Systems, BSc Electrical and Computer Engineering
Andreas Goulas, PhD Candidate
BSc Electrical and Computer Engineering
Links το the project’s publications, public deliverables (upon approval) and other research materials will be published here.
eXeLMM: “Explainable and efficient large multimodal models for downstream video recognition tasks” is a 36-month research project (Oct. 2025 - Sept. 2028), funded by the Hellenic Foundation for Research and Innovation (H.F.R.I.). The project is implemented in the framework of H.F.R.I call “3rd Call for H.F.R.I.’s Research Projects to Support Faculty Members & Researchers” (H.F.R.I. Project Number: 25957).