Study shows AI agents struggle with CRM and confidentiality

Large Language Model (LLM) agents aren’t superb at key parts of CRM, in response to a study led by Salesforce AI scientist Kung-Hsiang Huang.

The report showed AI agents had a roughly 58% success rate on single-step tasks that didn’t require follow-up actions or information. That dropped to 35% when a task required multiple steps. The agents were also notably bad at handling confidential information.

“Agents display low confidentiality awareness, which, while improvable through targeted prompting, often negatively impacts task performance,” the report said.

Varying performance and multi-turn problems

While the agents struggled with many tasks, they excelled at “Workflow Execution,” with one of the best agents having an 83% success rate in single-turn tasks. The foremost reason agents struggled with multi-step tasks was their difficulty proactively acquiring crucial, underspecified information through clarification dialogues.

Dig deeper: 7 suggestions for getting began with AI agents and automations

The more agents asked for clarification, the higher the general performance in complex multi-turn scenarios. That underlines the worth of effective information gathering. It also means marketers must concentrate on agents’ problems handling nuanced, evolving customer conversations that demand iterative information gathering or dynamic problem-solving.

Alarming lack of confidentiality awareness

One of the most important takeaways for marketers: Most large language models have almost no built-in sense of what counts as confidential. They don’t naturally understand what’s sensitive or the way it must be handled.

You can prompt them to avoid sharing or acting on private info — but that comes with tradeoffs. These prompts could make the model less effective at completing tasks, and the effect wears off in prolonged conversations. Basically, the more back-and-forth you could have, the more likely the model will forget those original safety instructions.

Open-source models struggled essentially the most with this, likely because they’ve a harder time following layered or complex instructions.

Dig deeper: Salesforce Agentforce: What you’ll want to know

This is a serious red flag for marketers working with PII, confidential client information or proprietary company data. Without solid, tested safeguards in place, using LLMs for sensitive tasks may lead to privacy breaches, legal trouble, or brand damage.

The bottom line: LLM agents still aren’t ready for high-stakes, data-heavy work without higher reasoning, stronger safety protocols, and smarter skills.

The complete study is out there here.

The post Study shows AI agents struggle with CRM and confidentiality appeared first on MarTech.

Read the total article here