← Back to home | All posts

SalesBench: Testing Adaptive Social Intelligence in Extended Goal-Oriented Interactions

This was a hackathon project I build during the summer of 25.


How do agents perform in high-pressure sales scenarios?

We answer this by letting AI agents navigate complex sales conversations. The agents need to handle objections, read buyer signals, and close deals to maximize revenue.

Can AI actually sell me this pen?

AI Agents have gained significant traction amongst the developer community due to their ability to quickly generate and iterate on code. This has led to an unprecedented amount of engineering efficiency and enabled smaller & smarter teams to move faster than incumbents. But engineering is typically only half of a business, you still have to be able to sell a product to generate revenue.

Introducing SalesBench, benchmarking AI Agents in adaptive social intelligence in extended goal-oriented interactions.

How do agents navigate complex human dynamics over extended conversations? We answer this by having agents conduct cold-call life insurance sales to diverse simulated buyers. The agents must build rapport, handle objections, and adapt their strategy across long conversational horizons to successfully close deals.

We've evaluated some of the top foundation models to see exactly how persuasive models can really be.

LEADERBOARD

Comprehensive evaluation of AI agents across extended goal-oriented sales interactions. Ranked by overall performance across all difficulty levels and metrics

Leaderboard

The Eval

SalesBench is a simulated environment that tests how well AI models can navigate complex social interactions in a goal-oriented context: conducting life insurance sales calls. The AI agent must build trust, identify needs, handle objections, and adapt their approach based on buyer personality. We break this into individually manageable tasks that, over extended conversations, reveal an AI's ability to maintain coherent social intelligence, as well as managing context over extended time.

We simulate 1 day for the Agents to try and close as many life insurance sales as possible. We have 100 custom buyer personas generated by combining a randomized assortment of circumstances. See more here.

How it works:

At a high level there is a Sales-Operator Agent that orchestrates the entire sales process.

Tools

We've added memory tools to assist with context quality. Previous methods of context maintenance relied on the system intelligently deciding what to keep and not keep in context via summarization. We've implemented a sliding window context with memory tools approach; similar to Vending Bench.

The Agent keeps the most recent 30,000 tokens in context before inference, and give the Agent the responsibility to decide what should be saved to memory using tools. Memories are saved as embeddings and memory reads are done with cosine similarity.

The call_lead tool initiates a subagent that actually completes the sales call with the prospective buyer agent. We decided to delegate this to a subagent task to make context more efficient. The sub agent passes up a summary containing important information about the sales call to the Sales-Operator, and from there it can choose to save to memory, update CRM, etc.

We simulate time by assigning set minute times for each tool call (with the exception of Sales Calls, which are calculated by multiplying the total number of Agent response loops by 30 seconds), and giving the Agent the ability to wait until the next working day, as well as sleep until a certain time if it sees fit.

Mermaid

Conclusion

SalesBench reveals both the promise and limitations of AI in sales. While Claude’s 86% close rate shows AI can build relationships and close deals, every model’s failure with difficult customers highlights the gap between current AI and human social intelligence.

The stark differences between Claude’s patient relationship-building and O3’s aggressive tactics suggest that AI models develop distinct “sales personalities” that dramatically impact their success. As businesses increasingly explore AI for customer interactions, understanding these differences becomes crucial.

Can AI actually sell? Yes—but only to customers who want to buy. The art of converting skeptics, handling complex objections, and building long-term relationships remains uniquely human, at least for now.

Reference:
By Kyle Jeong, Sameel Arif, & Hamza Mostafa

Heavily Inspired by Vending Bench https://andonlabs.com/evals/vending-bench