Victor Barres
AI Researcher at Sierra.
I’m a researcher at Sierra, where I build and study conversational AI agents and the benchmarks that tell us when they actually work — the τ-Bench family (live leaderboard at taubench.com). My focus is on agents that must execute complex tool-using tasks while sustaining long, coherent conversational interactions — the two things real-world deployments demand at once.
My background is in computational cognitive science and cognitive linguistics, and I’ve spent years building real-world conversational systems across several startups. More on how I think about the work →
current work
I lead the current work on the τ-Bench family of agent benchmarks (originally introduced at Sierra in 2024) — code, repo, public leaderboard, and a sequence of extensions:
- τ²-Bench — extends τ-Bench to a dual-control setting where both the agent and the user can act on the world.
- τ-Knowledge — knowledge-retrieval domain.
- τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks.
- τ³-Bench — combines τ-Knowledge and τ-Voice with community-contributed task fixes and code improvements.
news
| May 11, 2026 | Three τ-Bench family papers accepted to ICML 2026 — including τ²-Bench as a spotlight: τ²-Bench (dual-control evaluation), τ-Knowledge (knowledge retrieval), and τ-Voice (full-duplex voice agents). See you in July! |
|---|---|
| May 01, 2026 | τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks. Voice agents have closed most of the gap to non-reasoning text models in ~8 months. |
| Apr 20, 2026 | μ-Bench released — an open multilingual transcription benchmark covering 5 locales, 5 ASR providers, and 4,270 human-annotated utterances from real customer calls. |
| Mar 18, 2026 | τ³-Bench released — extending τ-Bench with a knowledge-retrieval domain (τ-Knowledge), full-duplex voice evaluation (τ-Voice), and community-contributed task fixes. Live leaderboard at taubench.com. |
| Mar 02, 2026 | Organizing and judging the Sierra τ²-Bench Custom Track of the AgentX–AgentBeats Competition (Berkeley RDI, Fall 2025 – Spring 2026). |
selected publications
- NAACLFrom Generating Answers to Building Explanations: Integrating Multi-Round RAG and Causal Modeling for Scientific QAIn Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Industry Track), 2025