I went into this week expecting a clean story:
“Define an agent once in Microsoft Foundry, connect it to a knowledge base, publish a stable endpoint… and still get per-user security trimming for enterprise data.”
After a lot of hands-on testing, I’m not there yet.
If your knowledge source requires document-level ACL trimming (Azure AI Search ACL trimming, SharePoint-backed sources, Foundry IQ / Knowledge Bases that depend on user context), there’s a practical gap today:
Microsoft Foundry agents (preview) don’t give you a supported way to inject request-scoped user auth context per run when using knowledge retrieval through the agent platform.
This post is the guidance I’m giving customers right now.
Scope note: this reflects behavior I observed in January 2026 while the relevant agent capabilities are still in preview. Expect this area to change quickly.
Terminology
Microsoft’s official docs talk about Foundry agents / Foundry Agent Service (preview) and Agent Applications.
In this post, I’ll use a shorthand to make the comparison unambiguous:
- Foundry Agents 2.0 = the new Microsoft Foundry agents (preview) experience (this is my shorthand, not an official product name)
- Foundry Agents 1.0 (classic) = the non-preview / older agent experience (this is my shorthand)
- Responses API (model + tools per request) = the request-scoped call surface you can use in both cases: to invoke agents (including Agent Applications) and to call Foundry models directly; the difference is whether tools/auth are defined on the agent version or injected per request
This matters because the security behavior is different depending on whether you’re configuring an agent version (agent-defined tools) or composing tools per request (Responses API tool injection).
At a glance: when to use what
- If you need per-user ACL trimming: use the Responses API (Foundry model + tools per request) so you can attach user context per call.
- If shared access is fine: use agents (shared identity) for reuse and a managed surface.
- If you want a stable endpoint: publish as an Agent Application, but assume RBAC-based access and client-managed conversation history.
Why per-user security trimming is non-negotiable
In enterprise, “knowledge retrieval” isn’t just about finding the right chunk — it’s about not leaking the wrong chunk.
If Alice can see DocA and Bob can’t, your agent must pass Alice’s identity to retrieval on every run so the source can trim results.
In Foundry + MCP + Search, that often means passing a request-scoped header like:
x-ms-query-source-authorization: <user-token>
The 3 different surfaces people mix up (I did too)
This is where confusion starts. There are (at least) three distinct interaction models:
- Direct OpenAI-compatible Responses API (model + tools per request)
- Agents inside a Foundry project (Foundry Agents 1.0 and Foundry Agent Service (preview))
- Agent Applications (published, stable endpoint with isolation and its own identity)
Each has different capabilities and security assumptions.
Agents vs “model + tools per request”: what’s actually different?
This is the practical difference I keep seeing teams miss.
When you use a Foundry agent
You’re opting into an agent definition + versioning surface:
- Tools are attached to the agent version (reusable configuration).
- You get a named asset you can iterate on (agent versions) and (when you publish) a stable endpoint via an Agent Application.
- It’s the right shape if you want reuse (agents, and workflows where applicable) and a platform-managed serving surface.
But you also inherit platform constraints: if there’s no request-scoped tool auth surface, you can’t “just pass a header” per run.
When you use a Foundry model directly
You’re using a request-scoped composition surface:
- Tools (including MCP tools) are injected per
responses.create()call. - Headers can be request-scoped, which is exactly what per-user trimming needs.
But: this bypasses agent reuse. The tool isn’t part of an agent definition, so it can’t be reused via agents/workflows or exposed via an Agent Application without reintroducing the same gap.
More context (what I observed): portal vs SDK + how docs map to behavior
One reason this problem is easy to miss is that the portal experience can make secure retrieval feel like it “just works”.
In my testing:
- In the Foundry portal, querying a knowledge base that enforces ACL trimming can succeed as an interactive user.
- When I try to reproduce the same pattern from code via agents/knowledge retrieval, retrieval results can come back empty unless the knowledge source receives request-scoped user context.
These are the conclusions from my hands-on testing, mapped to what Microsoft documents today:
-
Responses API supports request-scoped MCP headers (this is the only model where request-scoped headers naturally fit)
-
Agent-defined tools are agent-version scoped (headers behave like configuration, not per-run inputs)
-
Publishing creates an Agent Application, but inbound auth is RBAC by default
- Microsoft documents this in Publish and share agents / Agent Applications: default inbound authentication is Azure RBAC (
/applications/invoke/action), andAzure AI Useris listed as the minimum role to chat with a published agent.
If you read only one thing in this post: the friction comes from trying to apply “per-request user context” patterns to an “agent version configuration” surface.
What works today: per-request MCP headers with the Responses API
If you bypass agents and call the Responses API directly, you can attach headers per request.
mcp_tool = {
"type": "mcp",
"server_label": "kb_acl_test",
"server_url": KB_MCP_URL,
"project_connection_id": PROJECT_CONNECTION_ID,
"require_approval": "never",
"allowed_tools": ["knowledge_base_retrieve"],
"headers": {
"x-ms-query-source-authorization": user_token,
},
}
response = openai_client.responses.create(
model=MODEL_DEPLOYMENT,
input=USER_QUERY,
tools=[mcp_tool],
)Expected outcome:
- The knowledge source trims results to the caller’s permissions.
- Different users can ask the same question and receive different (correct) citations.
If you need true per-user ACL trimming today, this is the most straightforward path: do Entra auth + OBO in your app, then call Responses with request-scoped tool headers.
Why this matches your mental model
This is the same idea you’d use for any enterprise integration:
- Your app authenticates the user with Microsoft Entra ID.
- Your app obtains a delegated token (OBO) for the downstream resource.
- Your app calls the model/tool surface with that token attached per request.
The Knowledge Retrieval docs example is a security footgun (and should say so)
This is not stated clearly in the docs today, and it should be.
Reference: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/tools/knowledge-retrieval
The current example is misleading from a security standpoint because it effectively:
- Evaluates a token provider at agent creation time
- Bakes a user-bound, expiring token into the agent definition
Here’s the risky pattern (note the final () on the token provider):
from azure.identity import get_bearer_token_provider
# Create MCP tool with SharePoint authorization header
mcp_kb_tool = MCPTool(
server_label="knowledge-base",
server_url=mcp_endpoint,
require_approval="never",
allowed_tools=["knowledge_base_retrieve"],
project_connection_id=project_connection_name,
headers={
"x-ms-query-source-authorization": get_bearer_token_provider(
credential,
"https://search.azure.com/.default"
)() # ← This () evaluates the token NOW
}
)
# Create agent with MCP tool
agent = project_client.agents.create_version(
agent_name=agent_name,
definition=PromptAgentDefinition(
model=agent_model,
instructions=instructions,
tools=[mcp_kb_tool] # ← Tool + token are now baked into this agent version
)
)Why this is a problem:
- The token will eventually expire (you ship an agent version that later fails retrieval).
- The identity used is not the invoking user, but whoever created the agent (or whatever service identity was used during creation).
- Developers may incorrectly assume this provides per-user ACL trimming.
- It conflates service credentials with user security context.
What the docs should say explicitly:
- This is a service credential pattern.
- It’s not request-scoped.
- It’s not suitable for per-user security trimming.
If you need per-user security trimming, you need a surface where user context can be attached per request / per run (for example via request-scoped headers when calling the Responses API).
Publishing doesn’t solve it (and introduces a second enterprise concern)
Publishing creates an Agent Application with its own identity and a stable endpoint.
Reference: Publish and share agents / Agent Applications
Two details from the docs matter a lot for architecture:
- Inbound auth (default) is Azure RBAC
- Callers must have the Azure RBAC permission
/applications/invoke/actionon the application resource. - The docs also state users need at least Azure AI User role on the Agent Application scope to chat with a published agent.
- The application endpoint is intentionally limited
- Only
POST /responsesis available. - Other APIs like
/conversations,/files,/vector_storesare not available, and the call forcesstore=false. - For multi-turn, the client must store conversation history.
That’s not “bad” — it’s just a very different production surface than many people expect.
My enterprise concern here is the combination:
- “End users need Azure RBAC to invoke” + “Azure AI User is broader than invoke-only”
In a lot of real customer environments, security teams treat Azure RBAC for end users as infrastructure access — and won’t allow it.
Docs say “application-scope invoke”. My tests required more.
This is the most important nuance from my tests:
- The docs clearly describe the intended model: callers need
/applications/invoke/actionon the Agent Application resource, and at least Azure AI User on the Agent Application scope to chat with a published agent. - In my testing, that was not sufficient for end-user invocation in the way the docs suggest. Even after granting the application-scope permission, I still hit access denied until the caller had broader access tied back to the Foundry project (what many orgs would treat as “Foundry project RBAC / Azure AI User”).
So the post’s guidance is deliberately conservative:
- Treat “publish = enterprise-ready invoke-only sharing for end users” as not reliable yet for strict enterprise RBAC expectations.
- If your environment cannot grant Azure roles to end users, plan for an app-layer front door (Entra app roles, OBO, your own authZ) and call the Responses API.
“Can’t we just use OAuth identity passthrough?”
Microsoft Foundry does support OAuth identity passthrough for MCP authentication, but not for Foundry IQ / knowledge retrieval tools yet.
Reference: https://learn.microsoft.com/en-us/azure/ai-foundry/agents/how-to/mcp-authentication?view=foundry
But the docs include a very explicit constraint:
- To use OAuth identity passthrough, users interacting with your agent need at least the Azure AI User role.
So yes, passthrough is a supported model — but it’s still a blocker for organizations that cannot grant Azure roles to end users.
This is the key enterprise mismatch:
- Security teams want: “Application auth + least privilege + no infra-level role assignments to end users.”
- The current platform guidance leans on: “End users have Azure roles so the platform can do passthrough.”
These two models are incompatible.
What I recommend to customers (right now)
Here’s the decision tree I’m using in real projects:
If you need per-user ACL trimming
- Prefer: App-layer Entra auth + OBO + Responses API (request-scoped MCP headers per call).
- Avoid: Agent definitions that embed a token at creation time.
- Treat “agent-based knowledge retrieval with per-user context” as preview-risk until there’s a request-scoped mechanism.
If you want agents/workflows reuse
- Use agents when your knowledge retrieval can operate with a shared identity (no per-user trimming) or when your org is willing to grant end users the required Azure roles for identity passthrough.
- Don’t use agents if your security model requires: “end users have no Azure RBAC assignments” + “per-request user token must be injected by your app”.
If shared access is acceptable (no per-user trimming)
- Use a shared identity (Agent Identity / Foundry project managed identity / key-based auth), and design your knowledge source accordingly.
If you want a stable endpoint for broader distribution
- Use Agent Applications, but assume:
- inbound access is RBAC-based by default
- conversation state is client-managed
The feature I’m waiting for (to make this enterprise-clean)
This is what would unblock a lot of production scenarios:
- A supported way to pass request-scoped tool auth context per agent run (for example, via a
tool_resources-style mechanism) so that user context is never persisted in the agent definition. - A truly least-privilege “invoke-only” model that fits common enterprise patterns (ideally Entra app roles + app-layer OBO).
Resources (official docs)
- Knowledge retrieval with agents
- MCP authentication (preview)
- Publish and share agents / Agent Applications
I’m also actively sharing these findings with folks on the Microsoft Foundry team so we can close the gaps faster — if you’re running into the same constraints in production, I’d love to compare notes.
If you’re building with Microsoft Foundry agents (or searching for Azure AI Foundry) and you’re hitting this exact security trimming wall: what’s your target architecture?
Are you forced into app-layer OBO too, or did you find a clean agent-native pattern that I missed?
If you want to discuss real-world enterprise patterns (and what you’re seeing in your tenant), reach out on LinkedIn