Chain of Thought Monitorability: An Opportunity for AI Safety

Chain of Thought Monitorability: An Opportunity for AI Safety

By NAVER LABS Europe

Virtual Seminar (Zoom). Speaker: Tomek Korbak, Senior research scientist, AI Security Institute

Date and time

Location

Online

Good to know

Highlights

  • Online

About this event

Science & Tech • Science

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Abstract: AI systems that "think" in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed. Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

About the speaker: Senior Research Scientist, UK AI Security Institute

Join Zoom Meetinghttps://naverlabs.zoom.us/j/92349447243?pwd=K5OCBFDytPWkZBLyqv1zPfuN4uqIlW.1Meeting ID: 923 4944 7243Passcode: 981288

Organized by

NAVER LABS Europe

Followers

--

Events

--

Hosting

--

On Sale Sep 10 at 5:00 PM