Victor Barres

AI Researcher at Sierra.

Victor Barres, AI Researcher at Sierra.

I’m a researcher at Sierra, where I build and study conversational AI agents and the benchmarks that tell us when they actually work — the τ-Bench family (live leaderboard at taubench.com). My focus is on agents that must execute complex tool-using tasks while sustaining long, coherent conversational interactions — the two things real-world deployments demand at once.

My background is in computational cognitive science and cognitive linguistics, and I’ve spent years building real-world conversational systems across several startups. More on how I think about the work →

current work

I lead the current work on the τ-Bench family of agent benchmarks (originally introduced at Sierra in 2024) — code, repo, public leaderboard, and a sequence of extensions:

  • τ²-Bench — extends τ-Bench to a dual-control setting where both the agent and the user can act on the world.
  • τ-Knowledge — knowledge-retrieval domain.
  • τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks.
  • τ³-Bench — combines τ-Knowledge and τ-Voice with community-contributed task fixes and code improvements.

news

May 11, 2026 Three τ-Bench family papers accepted to ICML 2026 — including τ²-Bench as a spotlight: τ²-Bench (dual-control evaluation), τ-Knowledge (knowledge retrieval), and τ-Voice (full-duplex voice agents). See you in July!
May 01, 2026 τ-Voice — first benchmark to measure full-duplex voice agents on realistic, grounded customer-service tasks. Voice agents have closed most of the gap to non-reasoning text models in ~8 months.
Apr 20, 2026 μ-Bench released — an open multilingual transcription benchmark covering 5 locales, 5 ASR providers, and 4,270 human-annotated utterances from real customer calls.
Mar 18, 2026 τ³-Bench released — extending τ-Bench with a knowledge-retrieval domain (τ-Knowledge), full-duplex voice evaluation (τ-Voice), and community-contributed task fixes. Live leaderboard at taubench.com.
Mar 02, 2026 Organizing and judging the Sierra τ²-Bench Custom Track of the AgentX–AgentBeats Competition (Berkeley RDI, Fall 2025 – Spring 2026).

selected publications

  1. ICML
    τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge
    Quan Shi, Alexandra Zytek, Pedram Razavi, and 2 more authors
    arXiv preprint arXiv:2603.04370, 2026
    Accepted at ICML 2026.
  2. ICML
    τ-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains
    Victor Barres*, Soham Ray*, Keshav Dhandhania*, and 1 more author
    2026
    Accepted at ICML 2026.
  3. ICML
    τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
    Victor Barres*, Honghua Dong*, Soham Ray, and 2 more authors
    2025
    Spotlight at ICML 2026.
  4. NAACL
    From Generating Answers to Building Explanations: Integrating Multi-Round RAG and Causal Modeling for Scientific QA
    Victor Barres*, Clifton James McFate*, Aditya Kalyanpur, and 4 more authors
    In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Industry Track), 2025
  5. arXiv
    LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic
    Aditya Kalyanpur, Kailash Karthik Saravanakumar, Victor Barres, and 3 more authors
    2024
  6. AAAI-SS
    Template Construction Grammar: A Schema-Theoretic Computational Construction Grammar
    Victor J. Barres
    In AAAI Spring Symposium on Computational Construction Grammar and Natural Language Understanding, 2017

See all publications →