Head of Research · Moments Lab

Unraveling the science of videos.

Large video collections are difficult to search because evidence is distributed across time, modalities, and incomplete metadata. I design indexing, retrieval, and reasoning pipelines that connect queries to relevant video segments and grounded outputs. The goal is to make broadcast and archival video usable for search, analysis, and decision support.

Publications Selected work CV (PDF) Contact

Research focus

Video understanding

Long-form video carries meaning across time, modalities, and context. I build shot detection, chaptering, and multimodal representation pipelines that turn raw footage into structured, machine understandable units.

Video retrieval

Video queries combine language, vision, and time, which makes evidence retrieval ambiguous and expensive. I design multimodal retrieval systems that ground reasoning in relevant segments rather than coarse video summaries.

AI fairness

Models inherit and amplify the biases present in their training data. I study how disability, minority, and representation gaps propagate through vision and language systems, and how to evaluate and mitigate them.

Selected work

Representative outputs on multimodal retrieval, reasoning, and indexing for video data. The full curated list is on the Research page.

Paper
PEEK: Picking Essential frames via Efficient Knowledge distillation

Introduces a lightweight dynamic frame selector that distills caption-conditioned relevance from a teacher model for efficient low-budget video captioning.

arXiv Code Hugging Face Project page
Paper
Towards Retrieval Augmented Generation over Large Video Libraries

Addresses how to ground answers in large video libraries by combining retrieval over video segments with generation conditioned on retrieved evidence.

arXiv Hugging Face Research note DOI
Paper
Frame Sampling Strategies Matter: A Benchmark for small vision language models

Shows that video reasoning results depend strongly on frame sampling and provides a benchmark for evaluating small vision-language models under controlled settings.

arXiv Code Hugging Face Research note
Paper
Multimodal Chaptering for Long-Form TV Newscast Video

Studies how to segment long-form news video into retrievable chapters and builds a multimodal chaptering pipeline for downstream search.

arXiv Hugging Face Colab Research note
Patent
Computer-based platforms and methods for efficient AI-based digital video shot indexing

Builds an industrial method for shot-level video indexing so large collections can be searched at scale.

Patent record

See all publications →

Bio

I am Head of Research at Moments Lab, where I lead research on multimodal retrieval and reasoning over large-scale video data. I build video indexing systems, retrieval pipelines, and evaluation methods for long-form broadcast and archival collections.

I completed a PhD at Institut Polytechnique de Paris (advised by Jérôme Boudy and Gérard Chollet) on multimodal speaker diarization with work on robustness and fairness in real-world conditions.

Outside of Moments Lab, I co-founded VocaCoach, a speech training platform (VivaTech award, covered by Le Parisien). I also built UpToCure, an AI-powered rare disease research platform, and serve on the board of Universal Wings Mobility (accessible travel AI). I contributed to DiverseSpectrum, an open-source dataset for minority representation in AI.

Publications · Moments Lab research · Hugging Face

Press & Talks

Selected public interventions and press coverage.

How AI Video Understanding is Rewriting the Rules of Media Production and Discovery

November 2025 · Ad World News
Generation AI (FOST)

December 2025 · Speaker · Future of Software Technologies
OECD panel: AI and disability

December 2024
TEDx: Handicap, Société et IA

March 2024 · TEDxTélécom SudParis

Press & Talks →