Microsoft’s Jeff Hollan on What Makes an AI Agent Enterprise-Ready

Jeff Hollan, Partner Director of Product at Microsoft. Photo courtesy of Jeff Hollan.

In this Q&A, Jeff Hollan, Partner Director of Product at Microsoft, discusses what separates true AI agents from chat interfaces with tools, where enterprise teams are finding real traction with agents, and what he believes will determine which agent strategies succeed over the next 12 to 24 months.

Jeff leads the Agent Platform in Microsoft Foundry. His team builds the Agent Service, Agent Framework and SDKs, workflows, and developer experience that enable any developer to build, deploy, and manage enterprise AI agents at scale. Before returning to Microsoft, Jeff was Director of Product at Snowflake, where he launched Snowflake Intelligence and led the Cortex AI, Snowpark, and Developer Platform efforts. Earlier at Microsoft, he directed Azure’s serverless and PaaS business, including Azure Functions, Container Apps, App Services, and Static Web Apps, which grew into some of Azure’s most widely adopted services. He is also a co-creator and maintainer of KEDA, an open-sourced CNCF graduated project for event-driven autoscaling in Kubernetes.

In practical engineering terms, what separates an “AI agent” from a chat interface with tools, and what minimum capabilities make something truly agentic?

The lines definitely blur, and we get excited to call everything an “AI agent” when a new trend emerges. In practical terms, a chat interface is reactive — it responds to a prompt and maybe calls a tool once or twice. What makes something truly agentic is the reasoning capability and ability to work toward a concrete goal. An agent breaks work into steps, reasons through decisions, tracks its progress, and continues until it feels it has adequately met that goal — or knows it should stop and ask for help.

What are the most credible enterprise agent use cases you’re seeing move from pilot to production, and what’s making those succeed?

The ones that ship are the ones that automate tasks requiring a lot of effort that are repeatable and well-bounded. Triaging support cases, performing first-level analysis, doing research, preparing for meetings, or prepping for sales engagements — these are things that require significant manual labor today and where agents deliver real results. What’s consistent across these examples is that the scope is clear, the agent is anchored to trusted data, and the team designs a safe “handoff to human” path from the start. It’s not about replacing a role; it’s about reliably taking chunks of repetitive work off people’s plates.

What are the top blockers preventing agents from reaching production at scale, and what are the most effective ways teams are getting past them?

For an agent to work in the enterprise, it needs access to the right context. That’s often where things get hard. Work context, data, and knowledge can be scattered across office docs, knowledge bases, and data lakes. Pulling all of that together on top of a platform that meets security and compliance needs, like private data handling and private connectivity, is a real challenge. It’s easy to build a prototype but doing it in a way that’s effective and production-ready is difficult. This is why I encourage teams to leverage platforms that pull these things together for you. Beyond data access, teams that succeed treat evaluation and quality as a priority from the get-go, not a side project. They start with “recommend and draft” before “take action,” putting identity, access, and logging in place early. This defines what “good” looks like so they can ship improvements steadily.

Where do agent deployments fail most often in the real world — and what’s the best “first fix” you recommend?

Without a clear problem outlined for the agent to solve, the agent will lack needed context, return mixed signals, and you end up with confident-sounding output that’s hard to trust. The first fix is tightening the definition of the task and what “done” means. Give the agent a concrete objective, make sure it’s pulling from the right sources, and normalize it pausing to request missing information rather than guessing.

When should teams choose a single generalist agent vs. a multi-agent system vs. an agent plus deterministic workflows — and what decision criteria do you use?

Start as simple as possible. A single generalist agent is great when you have a human heavily in the loop who can guide and monitor the agent who’s acting as an assistant or copilot. Generalist agents fall short when the task is highly consequential and you don’t have a human as closely in the loop. That’s where purpose-built agents playing defined roles in a multi-agent system, potentially with deterministic workflows, become a better fit. The distinction comes down to how much human oversight is present and how critical the outcomes are.

What does “good” evaluation and monitoring look like for agents in production?

Good evaluation means you can look at real use cases and confirm that as your models evolve, prompts change, and you add new tools, you’re not regressing. A healthy set of evaluations, especially around tool selection and answer correctness, is critical. This is core to Microsoft Foundry: allowing teams to continuously measure and observe their agents while staying in control. Speed matters, but most importantly they need to run safely, predictably, and better each day.

If an agent can take actions, what are the non-negotiable controls that enable trust and compliance without slowing delivery?

Identity must be strictly managed, and you need to be able to track and trace every action, so you understand who did what and where. Explicit approval gates and enforceable guardrails are essential, and over time we’ll trust agents to do more on their own. One constraint I see is bounding an agent too tightly, which limits the return. There’s a balance to giving agents enough flexibility while maintaining the right guardrails. This is exactly why we built the Foundry control plane: a single place to define and enforce policies so teams can move fast without losing control.

Looking ahead 12–24 months, what’s the most important shift you expect in agent platforms or developer practice?

The biggest shift is that both how you build agents and how agents interact with your organization will become far more goal-oriented. Instead of hand-crafting every behavior, you’ll be coordinating things like setting the right goals and specifications while leveraging platforms that can do more optimization on their own, and often, improve themselves over time. We’re seeing this already in software engineering, where there’s far less manual effort and more focus on intent and outcomes. The winning teams will be the ones who can run agents like dependable systems because the models will keep getting better, but the differentiator will be the engineering discipline around them.

This Q&A is excerpted from “Generative AI: From Prototypes to Production, Operationalizing AI at Scale,” published by TechRepublic’s sister site, DZone. Download the free report to read more expert insights on scaling generative AI, deploying agents in production, and building the governance, data, and engineering foundations needed for enterprise adoption, or explore our DZone archives.

Source link