Research

I study multimodal retrieval and reasoning over large-scale video data. The work centers on indexing, evidence retrieval, and evaluation for long-form collections.

What I work on

The central problem is to retrieve the right evidence from long-form video and use it for reasoning. This is difficult because video is temporally distributed, weakly structured, and queried through language, metadata, and temporal constraints. I build systems that index video into searchable units and connect retrieval outputs to grounded reasoning pipelines.

A second problem is evaluation. Reported progress on video tasks can change with frame sampling, segmentation, and benchmark construction. I study evaluation protocols that expose those dependencies and support comparison under real operational constraints.

Research pillars

Video understanding

Long-form video carries meaning across time, modalities, and context. I build shot detection, chaptering, and multimodal representation pipelines that turn raw footage into structured, machine understandable units.

Video retrieval

Multimodal video queries require systems to align language, vision, and time. I design retrieval and reasoning pipelines that ground outputs in relevant video segments.

AI fairness

Models inherit and amplify the biases present in their training data. I study how disability, minority, and representation gaps propagate through vision and language systems, and how to evaluate and mitigate them.

Selected work

Curated for the core agenda of multimodal retrieval and reasoning over video data. The complete list is below.

Jump to all publications ↓

Reviewing activities

I reviewed for ICASSP 2026, ICPRAI 2026, and ICME 2025. I also served on the scientific committee of JETSAN 2025.

Teaching

  • Speaker Diarization — Guest lecture

    2024 · Graduate course on multimodal speech and speaker recognition. Covered diarization pipelines, multi-stream voice activity detection, evaluation, and fairness in real-world conditions.

Publications

Recent work centers on multimodal retrieval and reasoning over video data. Earlier publications cover multimodal speech processing, robustness, and fairness, which inform the current research agenda. Authoritative citation data: Google Scholar.

2025

  • Video understanding

    Frame Sampling Strategies Matter: A Benchmark for small vision language models

    Marija Brkic, Anas Filali Razzouki, Yannis Tevissen, Khalil Guetari, Mounim A. El Yacoubi · arXiv:2509.14769 · arXiv

  • Video retrieval

    Computer-based platforms and methods for efficient AI-based digital video shot indexing

    Frédéric Petitpont, Philippe Petitpont, Yannis Tevissen, Khalil Guetari · US Patent 12,288,377 · Apr 2025

2024

  • Video understanding

    Systems and methods for AI generation of image captions enriched with multiple AI modalities

    Frédéric Petitpont, Yannis Tevissen, Khalil Guetari · US Patent 12,148,233 · Nov 2024

  • Video retrieval

    Towards Retrieval Augmented Generation over Large Video Libraries

    Yannis Tevissen, Khalil Guetari, Frédéric Petitpont · HSI 2024 · Best Presentation Paper

  • Fairness

    Disability Representations: Finding Biases in Automatic Image Generation

    Yannis Tevissen · CVPR 2024 Workshop AVA · arXiv

  • Video understanding

    Multimodal Chaptering for Long-Form TV Newscast Video

    Khalil Guetari, Yannis Tevissen, Frédéric Petitpont · 2024 · arXiv

  • Video understanding

    Inserting Faces inside Captions: Image Captioning with Attention Guided Merging

    Yannis Tevissen, Khalil Guetari, Marine Tassel, Erwan Kerleroux, Frédéric Petitpont · arXiv:2405.02305 · arXiv

  • Speech processing

    Privacy Preserving Personal Assistant with On-Device Diarization and Spoken Dialogue System for Home and Beyond

    Gérard Chollet et al. · IHIET 2024

2023

  • Speech processing

    Diarisation multimodale: vers des modèles robustes et justes en contexte réel

    Yannis Tevissen · Institut Polytechnique de Paris

  • Speech processing

    Détection d'activité vocale Multi-flux pour la Diarisation du locuteur

    Yannis Tevissen, Jérôme Boudy, Gérard Chollet, Frédéric Petitpont · GRETSI 2023

  • Speech processing

    Home monitoring for frailty detection through sound and speaker diarization analysis

    Yannis Tevissen et al. · JETSAN 2023

  • Fairness

    Towards measuring and scoring speaker diarization fairness

    Yannis Tevissen, Jérôme Boudy, Gérard Chollet, Frédéric Petitpont · arXiv:2302.09991 · arXiv

2022

  • Speech processing

    Multi-stream voice activity detection for robust speaker diarization

    Yannis Tevissen, Jérôme Boudy, Gérard Chollet · GDR ISIS 2022

  • Speech processing

    The Newsbridge-Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description

    Yannis Tevissen, Jérôme Boudy, Frédéric Petitpont · VoxCeleb SRC 2022 Task 4