Announcement_6

τ²-Bench released — a benchmark for evaluating conversational agents in a dual-control environment, where both the agent and the user can take actions on the world.