news

releases, accepted papers, talks, organizing — most recent first.

May 11, 2026 Three τ-Bench family papers accepted to ICML 2026 — including τ²-Bench as a spotlight: τ²-Bench (dual-control evaluation), τ-Knowledge (knowledge retrieval), and τ-Voice (full-duplex voice agents). See you in July!
May 01, 2026 τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks. Voice agents have closed most of the gap to non-reasoning text models in ~8 months.
Apr 20, 2026 μ-Bench released — an open multilingual transcription benchmark covering 5 locales, 5 ASR providers, and 4,270 human-annotated utterances from real customer calls.
Mar 18, 2026 τ³-Bench released — extending τ-Bench with a knowledge-retrieval domain (τ-Knowledge), full-duplex voice evaluation (τ-Voice), and community-contributed task fixes. Live leaderboard at taubench.com.
Mar 02, 2026 Organizing and judging the Sierra τ²-Bench Custom Track of the AgentX–AgentBeats Competition (Berkeley RDI, Fall 2025 – Spring 2026).
Jun 10, 2025 τ²-Bench released — a benchmark for evaluating conversational agents in a dual-control environment, where both the agent and the user can take actions on the world.