← Blog

Function calling and tool use: how agents actually act

How models go from text to action through typed tool definitions, and why the quality of your tool schemas decides how reliable your agent is.

A language model, left to itself, produces text. Text is useful, but text alone does not send an email, query a database, or move money. The mechanism that closes that gap is called function calling, sometimes tool use. It is the single most consequential feature in practical agent architecture, and it is widely misunderstood.

The mechanics: four steps, repeated

The loop is simple enough to describe in a sentence: you describe a set of tools to the model, the model selects one and fills in its arguments, your code runs the tool and returns the result, then the model continues. Repeat until the task is done.

In practice it looks like this:

  1. Describe, you send the model a list of tool schemas alongside the conversation.
  2. Select, the model replies not with prose but with a structured tool call: a name and a JSON argument object.
  3. Execute, your runtime calls the actual function, catches errors, and feeds the result back.
  4. Continue, the model sees the result and decides whether to call another tool or produce a final answer.

That’s the whole protocol. What separates a reliable agent from a brittle one is almost entirely determined in step one, the quality of the schemas you supply.

Schemas are the contract

A tool schema is not documentation for a human reader. It is a formal contract the model uses to decide whether to call the tool, what to pass it, and what to expect back. Treat it that way.

A minimal, well-typed schema for a calendar slot lookup might look like this:

{
  "name": "find_free_slot",
  "description": "Returns the next available meeting slot for a given attendee within a date range.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "attendee_email": { "type": "string", "format": "email" },
      "range_start":    { "type": "string", "format": "date" },
      "range_end":      { "type": "string", "format": "date" },
      "duration_min":   { "type": "integer", "minimum": 15, "maximum": 480 }
    },
    "required": ["attendee_email", "range_start", "range_end", "duration_min"]
  }
}

Notice what this schema does: it constrains every input to a type and a range, marks required fields explicitly, and uses a clear, unambiguous name. The model cannot guess whether duration_min takes seconds or minutes, the name and the constraint leave no room for ambiguity.

What goes wrong with vague tools

When a schema is loose, generic names, untyped or over-broad inputs, no description of what the tool actually does, the model is forced to infer intent. Inference is probabilistic. Some of the time it gets it right. Some of the time it calls the wrong tool, passes an argument in the wrong format, or silently interprets an ambiguous field incorrectly.

These failures are especially hard to catch because the model does not error out, it acts. A missing type constraint on a date field might cause the model to pass a natural-language string like "next Tuesday" where your backend expects "2025-11-25". The agent looks confident. The call fails downstream, perhaps silently.

Common schema mistakes that cause agent failures
  • Generic names get_data, do_thing, model cannot distinguish tools
  • Untyped inputs any string accepted; model invents its own format
  • Missing required/optional distinction model omits fields or over-fills them
  • No error schema partial failures are invisible to the agent
  • No description model falls back to guessing from the name alone

Error schemas matter as much as input schemas

Most tool definitions spend all their care on inputs and ignore outputs and errors. That is a mistake. If a tool can partially succeed, returning three of five requested records, for example, the model needs to know that. If a tool can fail with a retryable error versus a hard stop, the agent needs to distinguish them to decide what to do next.

A well-designed tool schema includes at minimum: what a successful response looks like, what a partial result looks like, and which error codes are safe to retry. Without this, the model either gives up too early or retries something that will never succeed.

Risk is part of the contract

Critical Not all tools are equal. Reading a calendar slot carries different stakes than deleting a record or sending a message on behalf of a user. The model cannot infer risk from a schema alone, you have to declare it.

In MCP-first architecture, every tool in the capability layer carries explicit metadata: its input/output/error contract, the permissions it requires, and a risk level. That risk level is machine-readable. It lets the runtime apply confirmation gates before irreversible actions, filter tool visibility by principal and scope, and build a complete audit trail without relying on the model to self-police.

The shift in how you think about tools

Function calling changes what software design means. The old question was: “what screens does a user need?” The new question is: “what typed, permissioned, auditable actions does the system expose?” A tool is not a helper function wrapped in JSON. It is a capability with a contract, inputs, outputs, errors, risk, and permissions all declared upfront.

Build the contract first. The reliability of everything the agent does flows directly from the precision of the contract you give it.

The quality of your tool schemas is the quality of your agent.

The core principle