Risk Model & Protection Classes

Full agent capability does not mean an agent may do everything. It means the system can describe every one of its states and capabilities in a structured way, including the capabilities that remain fundamentally off-limits to an agent. The risk model is the tool that draws this boundary: precise, machine-readable, and without exceptions.

Sensitive areas

Some parts of a system require a higher level of protection, not only against external attackers, but also against internal agents. A common misconception states:

The user is allowed to see it, so the AI is allowed to see it too.

That is not correct. Effective AI access results from the intersection of user permission, agent permission, client trust level, resource sensitivity, purpose, and confirmation status. Each of these elements is its own barrier, not just the user role.

The sensitive areas that must be particularly protected include:

Financial data, bank details, payment data, billing details
Health data, medical documents, medical records, certificates
Salary data, wages, bonuses, salary history, payroll runs
Personnel files, contracts, application documents, internal evaluations
Credentials & API keys, passwords, private keys, access tokens
Deletion and bulk functions, delete user, delete tenant, bulk export
Permission management, granting and revoking permissions, changing roles
Security logs, audit trails, security events, active sessions
Private communications, confidential notes, internal assessments
Tax data and contracts, tax-relevant documents, legally binding contracts
Personal data under GDPR, all data that identifies a natural person

Six protection classes

Every resource and every tool is assigned to one of six protection classes. The class determines who gets access, whether an agent even sees the tool in discovery, and what approval is required before execution.

Public

Low

Publicly accessible content without restriction. No authentication, no special role, no confirmation required. Suitable for help articles, product information, and documented system capabilities.

help.article.read
public.product_info.read

Internal

Low

Only for logged-in users. Authentication is sufficient; no special privileges required. Typical for list queries, searches, and general work views of one’s own context.

project.list
contact.search

Confidential

Medium

Only accessible with an explicit role and assigned scope. A simple login is not enough. The data is internal but not intended for all employees.

contract.read
customer_private_note.read

Sensitive

High

Only with additional approval or restricted context sharing. An agent must not load this data into its context unchecked. When passed to AI, the scope must be narrowly limited and the purpose documented.

salary.read
bank_account.read
health_document.read

Critical

The AI may not act autonomously in this class. Every execution requires explicit user approval, often step-up authentication. The actions produce external effects, are difficult to undo, or directly affect third parties.

payment.execute
user.delete
contract.send
email.send_external
security.change_permissions

Forbidden for AI

This class is a hard boundary, not a policy decision. Tools and resources of this class are completely hidden from tool discovery for agents. An agent can neither read, call, nor reference them.

password.read
private_key.read
full_database_export
raw_access_token.read

What an agent must not see, it must not find. Forbidden tools do not appear in discovery responses.

The risk model

The risk model translates the protection classes into operative rules for every execution case. It determines whether autonomy is permitted, whether confirmation is required, and what type of approval is demanded.

Low

Purely read operations or actions without lasting external effect. Agents may call these tools autonomously.

Low Risk, Examples

list_projects Low
get_current_user Low
search_contacts Low
get_help_article Low
create_reminder Low

Medium

Data or states are created or modified, but within a narrowly defined scope without external effect. Autonomous execution is permitted when scope and context clearly emerge from the workflow.

Medium Risk, Examples

create_note Medium
update_task_status Medium
generate_summary Medium
create_draft Medium

High

Actions that change relevant system states or affect other users and external resources. Explicit confirmation is generally required.

High Risk, Examples

change_project_status High
invite_calendar_attendee High
share_file_link High
update_customer_data High

Critical

External communication, payments, deletions, permission changes. Always confirmation, often step-up auth. Autonomous execution is not permitted.

Critical Risk, Examples

emails.send_external Critical
payment.execute Critical
user.delete Critical
security.change_permissions Critical
payroll.export Critical
contract.send Critical

Forbidden

Forbidden for AI

Completely blocked for AI agents, neither reading nor writing. Not visible in discovery, not callable, not usable as a reference.

Forbidden, Examples

password.read Forbidden for AI
private_key.read Forbidden for AI
raw_access_token.read Forbidden for AI
disable_audit_log Forbidden for AI
full_database_export Forbidden for AI

Approval UX

The technical risk model only takes full effect when the user interface also meets its role. A confirmation is only as good as the information on which it is based.

Before a human confirms a critical agent action, they must be able to clearly see seven things:

What does the agent want to do?, The concrete action, not the agent’s intent in its own words.
Why does the agent want to do it?, Which intent or assignment triggered this action.
Which data is being used?, Recipients, attachments, referenced entities.
What external effects will occur?, What changes outside the system: email is sent, payment is triggered, document is transmitted.
Who is affected?, Persons, companies, tenants, external parties.
Can the action be undone?, Clear statement: reversible or irreversible.
What happens upon confirmation?, Complete description of the next system states.

The following example shows what a complete approval card for the tool emails.send_external looks like:

Sales Assistant

emails.send_external

Critical

Following up on project Havelblick based on the last interaction.

Recipient: Max Müller, Müller GmbH <[email protected]>
Subject: Follow-up on project Havelblick
Attachment: No direct attachment, download link expires after 14 days.
External effect: Email will be sent and become part of communication history.
Reversible: No, cannot be undone after sending.

GrundExternal communication with project-related information and download link. Irreversible after sending.

The agent waits for the user’s decision. It must not anticipate the confirmation, set a default action, or base execution on the timestamp of a previous approval.

100% controllable does not mean 100% autonomous.