Designing the AI-native enterprise, part 3: Securing your AI-agents

Enterprises are moving fast to integrate AI agents into both digital and physical workflows. These agents already write code, manage customer interactions, and control physical access systems [1], [3]. As their span of responsibility grows, the challenge is no longer whether AI will play a role, but how to secure and govern this new digital workforce without creating overhead or slowing innovation [2], [5].

The answer lies in extending proven security and governance principles into the AI-native context. Instead of inventing new committees or governance layers, enterprises should embed zero-trust and continuous reasoning directly into the design, while relying on existing risk and compliance functions to carry the oversight mandate forward. However, this must be balanced with the fundamental limitations of current AI technology and clear boundaries for human oversight [4].

This article is a follow up to our two previous installments on Designing the AI-native enterprise, part 1 and part 2.

Zero-Trust as a foundation

Zero-trust has become a cornerstone in cybersecurity: never trust, always verify, least privilege, and assume breach. In an AI-native enterprise, these principles must be applied not just to human identities and devices but also to agents acting on behalf of the enterprise [8].

“ Enterprises don’t need new governance bureaucracies for AI. They need to embed zero-trust, continuous reasoning, and human oversight into design from the start, so their digital workforce remains secure, accountable, and resilient. ”

Jens Eriksvik

Algorithma

In an AI-native enterprise, two fundamental access patterns for AI-agents emerge:

Inherited access from a human: In this model, an AI agent acts on behalf of a human user and inherits their permissions. For example, an email management agent may have access to a user's inbox, calendar, and contacts because it's operating with the user's delegated authority. This is a common and convenient model but introduces significant security risks.
Specific AI-agent access: This pattern grants the AI agent its own distinct identity and a set of permissions defined by its role.. The agent's permissions are limited to the minimum necessary to perform its specific function, adhering to the principle of least privilege. For instance, an invoice approval agent would have a specific role with permissions only to access financial transaction data and the ability to approve or deny invoices, separate from any human user's permissions.

Every AI agent, whether a language model scheduling meetings or a robotic system controlling physical doors, should be treated as an untrusted entity until verified in context. Verification is not a one-time check but a continuous process, ensuring that actions remain consistent with the agent's authorization and the environment it operates in. The application of Zero Trust tenets to the AI agent ecosystem provides a security framework [8]:

Explicit identity verification: Every interaction, whether initiated by a human user or an AI agent, must be strongly authenticated [9]. This extends beyond simple user credentials to a more complex system that validates both the identity of the human user delegating authority and the identity of the AI agent itself [10].
Least privilege access: AI agents should operate with the bare minimum permissions required to perform their specific functions. Granular access controls, time-limited authorizations, and function-specific permissions are essential to containing potential damage from a compromised agent.
Micro-segmentation: This critical component of ZTA involves creating granular, secure zones that isolate individual workloads, hosts, or containers. [11] For an agentic system, this means isolating critical components to prevent unauthorized data leakage between different parts of the system.
Continuous monitoring: ZTA necessitates complete and transparent visibility into all agent activities through detailed logging, real-time monitoring for anomalous behaviors, and robust audit trails.

Malicious access via inherited permissions

The inherited access model is particularly vulnerable to prompt injection attacks, as demonstrated in a security test where a customer service bot was successfully tricked into leaking sensitive data.

In this scenario, security researchers created an ethical hack on a customer service AI agent connected to a Salesforce CRM. The agent had broad, inherited access to customer records in the CRM to assist with customer inquiries. The researchers crafted a malicious prompt and injected it into a piece of data that the AI agent was designed to process; for example, a customer case or a hidden instruction within a customer's email.

Without any human intervention, the AI agent's "inherited" access allowed it to be hijacked. The prompt injection overrode its intended instructions and forced it to perform an unauthorized action: exfiltrating complete Salesforce records, including sensitive customer data, in bulk. This attack highlights how easily an agent with inherited permissions can be weaponized into a data extraction tool, underscoring the critical need for fine-grained access controls and least privilege principles.

Continuous reasoning in practice: Promise and limitations

The concept of continuous reasoning extends zero-trust with the ability for systems to reason dynamically about intent, context, and risk [3]. For example, an agent tasked with approving invoices should not only verify access credentials but also evaluate the transaction against patterns, policies, and expected behaviors in real time. However, current AI systems face significant constraints that must inform governance design:

Hallucination and inconsistency: Large Language Models (LLMs) can generate plausible but incorrect outputs, particularly when reasoning about novel situations. [14] Real-world examples demonstrate the severe legal and reputational risks [15]. These failures occur because models are often overconfident, prioritizing fluency over factual accuracy, making it difficult for users to recognize errors without outside confirmation .
Context Length limitations: The first generation of LLMs struggled with persistent memory, making it difficult to track context over time or operate coherently across extended interactions [2].
Reasoning fragility: While LLMs can appear to reason sophisticatedly, their reasoning can be brittle [16]. A Carnegie Mellon University study found that AI agents failed to reliably complete nearly 70% of routine office tasks in a simulated environment, often getting lost or taking erroneous shortcuts.
Training data boundaries: AI agents operate within the constraints of their training data, which may not cover edge cases, recent developments, or organization-specific contexts that are critical for sound decision-making [17], [13] . A misaligned architectural approach can result in an agent attempting to operate outside its intended domain [19].

These limitations mean that while agents can continuously interpret environments and apply policies, their reasoning should be treated as probabilistic rather than deterministic, requiring robust validation and human oversight protocols [15].

The most effective way to govern a digital workforce is to design security into the architecture from the start. This approach moves beyond the reactive, ad-hoc governance that currently prevails in many fragmented enterprise AI initiatives [2].

Secure-by-design workflows enforce least privilege, separation of duties, and escalation paths before an agent is deployed [8]. This foundation ensures that security considerations are embedded from the initial design phase rather than retrofitted after deployment. Micro-segmentation assigns agents to narrow, well-defined domains of responsibility, preventing them from crossing boundaries without explicit authorization. [11] A key component of this approach involves defining clear domain boundaries that align with business processes while maintaining security isolation. Policy-as-code translates governance rules into executable checks, creating an AI bill of materials that serves as a foundational inventory of all AI assets, including models, datasets, and APIs. This inventory provides the visibility needed to apply appropriate controls and manage potential vulnerabilities.

“ The real risk isn’t that AI agents fail, it’s that they succeed without the right boundaries. By giving every agent its own identity, least-privilege access, and continuous verification, we make security part of the architecture rather than an afterthought. ”

Alex Ekdahl

Algorithma

Defining domain boundaries in practice

Effective micro-segmentation requires carefully considering business process flows to create clear domain boundaries. This design-first approach avoids the trap of endless assessments and inventories by embedding protections directly into the workflow from the start.

First, process mapping is used to identify where natural handoffs occur in business workflows. These points serve as the ideal segmentation boundaries for isolating workloads into custom, workload-specific zones. Next, risk-based boundaries are applied to these domains; for instance, higher-risk areas like financial transactions or customer data access require stricter isolation and more limited agent capabilities. In financial services, an AI agent initiating a transaction must present multiple forms of verification, including its delegation token and any applicable spending limits. Finally, organizations must establish clear cross-domain protocols and exception handling procedures, which typically require human authorization or elevated verification when agents need to interact across different domains [22].

Instead of cataloguing every agent up front, enterprises can use this method to define protections into the workflow, allowing inventories to emerge naturally from the telemetry of live systems.

Essential safeguards: Human-in-the-Loop and Human-on-the-Loop

Given the limitations of current AI technology, human oversight isn't optional; it's a fundamental design requirement. This oversight comes in two primary forms: Human-in-the-Loop (HITL) and Human-on-the-Loop (HOTL) [15].

HITL involves direct human involvement in decision-making processes, especially for high-stakes decisions. For example, in healthcare, professionals must validate AI-suggested diagnoses before treatment. Similarly, in financial fraud detection, human analysts review transactions flagged by an AI to confirm their validity. This direct intervention is also crucial in novel situations where an AI encounters an edge case it wasn't trained for. A human can provide the correct input, which in turn helps improve the model over time. HITL is also essential for cross-domain interactions to ensure consistency and prevent errors as an agent moves between different domains.

HOTL, on the other hand, involves continuous monitoring with the capability to intervene. This includes using real-time monitoring dashboards to surface agent decisions and flag anomalies. Tools like Maxim AI and Langfuse provide end-to-end tracing for AI agent actions, monitoring key metrics such as latency, costs, and error rates. HOTL systems can also be equipped with circuit breakers that automatically pause agent operations when predefined risk thresholds are exceeded, as well as escalation triggers based on confidence levels, decision magnitude, or pattern deviations.

Designing effective human oversight

The key is designing oversight that adds value rather than creating bottlenecks [22]. A significant challenge is the "human-in-the-loop productivity paradox," where AI can actually slow down human employees, particularly experienced ones, due to the time required to fix errors and correct plausible but factually incorrect outputs. MIT highlighted this as a “verification tax”, undermining ROI of AI agents [23] To counter this, businesses should:

Implement asynchronous user authorization: This model decouples the authorization request from the agent's workflow. The agent sends a request to an authorization server and can continue with other tasks while waiting for a response, preventing the agent from being idled.
Elevate the human role: Human operators should be seen as "custodians" of the AI system, providing strategic guidance and ensuring compliance. They should be trained to understand the AI's strengths and limitations. This collaborative approach can deliver more business value than fully automated solutions in complex domains .
Use human interventions as training data: By tracking and learning from human feedback, enterprises can continuously improve their AI systems [23], [24]. Every AI decision should be treated like an employee decision under review, if mistakes aren't caught and corrected early, bad habits will compound over time.

This approach transforms human oversight from a mere safety net into a dynamic feedback loop that actively improves the system's performance and accuracy. By focusing on collaboration and continuous learning, enterprises can leverage human judgment to overcome the inherent limitations of AI, making their digital workforce more robust and reliable.

Re-using existing governance channels

A key principle is that AI does not (and should not) require new governance bodies. Instead, oversight should flow into the channels enterprises already trust, such as risk committees, compliance reviews, and security audits - based on the type of organization [25]. These existing forums are already in place to enforce accountability and can absorb AI oversight as part of their span of responsibility. This approach avoids duplication and ensures that AI oversight is aligned with established accountability structures.

However, existing governance structures may need enhancement to handle AI-specific risks. One such enhancement is shifting ownership, where the primary responsibility for AI risk management is moving to specialized strategic leaders like the General Counsel or Chief Information Security Officer (CISO). These leaders are uniquely positioned to operationalize AI risk management as a component of the broader enterprise risk strategy and to interpret emerging regulatory requirements. Additionally, enterprises are adopting formal frameworks. Major financial institutions like Morgan Stanley and Citi are developing collective AI governance frameworks, often leveraging existing, widely recognized guides like the NIST AI Risk Management Framework. This ensures a structured and consistent approach to managing AI risks. [10]

Right design first: An operational roadmap

Securing and governing the digital workforce starts not with inventories, but with the right design principles. This is not just theory. At Algorithma, we have seen again and again that projects fail when control is bolted on after deployment, rather than embedded from the outset. Whether it has been in industrial edge environments, customer service automation, or AI sustainment programs, the same lesson holds: resilience comes from architecture and governance baked into the design, not patched on later.

A practical roadmap looks like this:

Embed zero-trust from the outset – Treat every agent as untrusted until verified, and build continuous verification into workflows while acknowledging the probabilistic nature of AI reasoning .
Define protect surfaces and domain boundaries in the design stage – Focus on what matters most: data, systems, and decisions that carry risk. Map business processes to identify natural segmentation points and cross-domain interaction requirements.
Integrate identity, context, and capability checks – Use a cryptographic delegation model with explicit tokens to verify not just who or what the agent is, but whether the action makes sense in context and falls within the agent's demonstrated competency boundaries.
Design mandatory human oversight touchpoints – Identify decision types, risk levels, and situational contexts that require human-in-the-loop or human-on-the-loop intervention, and build these requirements into the architecture from the beginning.
Account for AI limitations in design – Build systems that assume hallucination, inconsistency, and reasoning failures will occur, with appropriate fallback and validation mechanisms. Implement a continuous, auditable feedback loop to correct AI behavior .
Leverage existing oversight – Assign clear ownership of AI risk management to strategic leaders and align agent oversight with current security and compliance reviews, while enhancing these structures to handle AI-specific governance challenges .
Iterate and scale through telemetry – Use monitoring data from purpose-built observability tools to refine controls, improve human oversight efficiency, and allow natural inventories of agents and behaviors to emerge .

This roadmap reflects a shift from control-after-deployment to control-by-design, where resilience is baked in rather than bolted on, and where human judgment remains central to high-stakes decision-making.

A fundamental change in how to operate

The rise of AI agents marks a fundamental change in how enterprises operate. Securing and governing this digital workforce requires acknowledging both the promise and limitations of current AI technology. While agents can augment human decision-making and automate routine processes, their probabilistic reasoning, potential for hallucination, and context limitations mean they cannot replace human oversight for critical decisions .

Effective governance does not require new bureaucracy or exhaustive mapping exercises. It requires embedding zero-trust, continuous reasoning, and secure-by-design principles into enterprise workflows while maintaining clear boundaries for human involvement. By designing mandatory human oversight touchpoints, defining clear domain boundaries, and building on existing governance structures, organizations can create resilient AI-native enterprises where agents serve as trusted tools, monitored, accountable, and operating within carefully defined parameters that respect both their capabilities and limitations.