Characterizing Hallucinations: A Two-Dimensional Analysis with Adi Simhi
Just Added

Characterizing Hallucinations: A Two-Dimensional Analysis with Adi Simhi

By Oxford Martin AI Governance Initiative

This talk investigates hallucinations that occur despite models possessing correct information.

Date and time

Location

Seminar Room 1, Oxford Martin School (University of Oxford)

34 Broad Street Oxford OX1 3BD United Kingdom

Lineup

Good to know

Highlights

  • 1 hour
  • In person

About this event

Hallucinations in Large Language Models (LLMs) present challenges beyond simple knowledge gaps. This talk investigates hallucinations that occur despite models possessing correct information, examining two key studies: Distinguishing Ignorance from Error in LLM Hallucinations and Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs.

We show how hallucinations can manifest when models know correct answers, and develop a methodology to isolate hallucination-despite-knowledge cases from ignorance-based errors. This distinction proves crucial for mitigation.We then investigate whether hallucinations despite knowledge can occur with high certainty, named CHOKE, finding that such high-certainty hallucinations exist and occur consistently. These findings challenge current uncertainty-based detection and mitigation methods. Next, we provide a novel way to evaluate mitigation methods on the CHOKE phenomenon. Finally, we introduce a probing-based mitigation that outperforms existing methods on mitigation CHOKE.

Adi Simhi is a PhD candidate at the Technion, advised by Prof. Yonatan Belinkov. Her research focuses on hallucinations, safety, and interpretability in LLMs. She received her Master's degree from the Technion under Prof. Shaul Markovitch in 2022. Adi also received the Council for Higher Education (VATAT) Scholarship for PhD students in data science and artificial intelligence.

Organized by

Oxford Martin AI Governance Initiative

Followers

--

Events

--

Hosting

--

Free
Sep 5 · 2:00 PM GMT+1