VLM-Driven Data and Context Curation for Visual Understanding

Virtual Seminar (Zoom). Speaker: Ahmet Iscen is a staff research scientist at Google.

By NAVER LABS Europe

Date and time

Location

Online

About this event

Event lasts 1 hour

VLM-Driven Data and Context Curation for Visual Understanding

Abstract:This presentation explores improving visual understanding by leveraging modern Vision-Language Models (VLMs) to curate their own input for both training and inference. First, we will examine a VLM-driven data approach for web-scale visual entity recognition where a VLM verifies and corrects noisy image-entity pairs using external context like Wikipedia, yielding a high-quality dataset that enables smaller models to achieve state-of-the-art performance. We then introduce "Temporal Chain of Thought," an inference strategy that addresses long-video understanding by prompting the VLM to first perform identify and select only the most relevant frames before answering a question, thereby mitigating distractors and overcoming context limitations. Together, these works show how models can actively refine their own supervision and context in complex visual tasks.

About the speaker: Ahmet Iscen is a staff research scientist at Google. Throughout his tenure, he has actively engaged in diverse research domains, including large-scale vision language models, multimodal understanding, 3D scene understanding, and image recognition. He made contributions to various Google products such as Lens, and Gemini. Prior to joining Google Ahmet worked as a postdoctoral researcher at Czech Technical University in Prague. He completed his Ph.D. degree at Université de Rennes I and Inria Rennes, and his PhD thesis received the Fondation Rennes 1 Best Thesis Award 2017 in the field of Mathematics, Sciences and Information and Communication Technologies.

Please join us on Zoom: https://naverlabs.zoom.us/j/93133743758?pwd=FWHbyTfgRNJmfFdndVwr21QyYNZsbU.1

Organized by

NAVER LABS Europe

NAVER LABS Europe (Grenoble, France) is the biggest industrial AI research lab in France with fundamental and applied research in the fields of machine learning and optimization, computer vision, natural language processing and human robot interaction. The main area of application is AI for Robotics where they work closely with NAVER LABS Korea the R&D subsidiary of NAVER, Korea’s leading internet company. NAVER LABS is responsible for creating future technology in robotics, AI, autonomous driving, 3D/HD mapping and AR.

LinkedIn - Bluesky

FreeJul 8 · 2:00 AM PDT

VLM-Driven Data and Context Curation for Visual Understanding

Date and time

Location

About this event

Tags

Organized by