news
releases, accepted papers, talks, organizing — most recent first.
| May 11, 2026 | Three τ-Bench family papers accepted to ICML 2026 — including τ²-Bench as a spotlight: τ²-Bench (dual-control evaluation), τ-Knowledge (knowledge retrieval), and τ-Voice (full-duplex voice agents). See you in July! |
|---|---|
| May 01, 2026 | τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks. Voice agents have closed most of the gap to non-reasoning text models in ~8 months. |
| Apr 20, 2026 | μ-Bench released — an open multilingual transcription benchmark covering 5 locales, 5 ASR providers, and 4,270 human-annotated utterances from real customer calls. |
| Mar 18, 2026 | τ³-Bench released — extending τ-Bench with a knowledge-retrieval domain (τ-Knowledge), full-duplex voice evaluation (τ-Voice), and community-contributed task fixes. Live leaderboard at taubench.com. |
| Mar 02, 2026 | Organizing and judging the Sierra τ²-Bench Custom Track of the AgentX–AgentBeats Competition (Berkeley RDI, Fall 2025 – Spring 2026). |
| Jun 10, 2025 | τ²-Bench released — a benchmark for evaluating conversational agents in a dual-control environment, where both the agent and the user can take actions on the world. |