Overview
Despite the remarkable advancements of deep learning methodologies over the past decade, in-depth video understanding remains a challenge. eXeLMM will address this challenge by harnessing the power of Large Language Models (LLMs) / Large Multimodal Models (LMMs). Powerful such open models will be selected, adapted and employed for achieving in-depth video understanding. As part of this work, eXeLMM will make considerable advancements in relation to critical issues that arise when attempting to exploit LLMs/LMMs in video tasks, such as computational efficiency and explainability.