Announcement_6
τ²-Bench released — a benchmark for evaluating conversational agents in a dual-control environment, where both the agent and the user can take actions on the world.
τ²-Bench released — a benchmark for evaluating conversational agents in a dual-control environment, where both the agent and the user can take actions on the world.