2.0.0
LLM sampling allows your MCP tools to request text generation from an LLM during execution. This enables tools to leverage AI capabilities for analysis, generation, reasoning, and more—without the client needing to orchestrate multiple calls.
By default, sampling requests are routed to the client’s LLM. You can also configure a fallback handler to use a specific provider (like OpenAI) when the client doesn’t support sampling, or to always use your own LLM regardless of client capabilities.
Overview
The simplest use of sampling is passing a prompt string toctx.sample(). The method sends the prompt to the LLM, waits for the complete response, and returns a SamplingResult. You can access the generated text through the .text attribute.
SamplingResult also provides .result (identical to .text for plain text responses) and .history containing the full message exchange—useful if you need to continue the conversation or debug the interaction.
System Prompts
System prompts let you establish the LLM’s role and behavioral guidelines before it processes your request. This is useful for controlling tone, enforcing constraints, or providing context that shouldn’t clutter the user-facing prompt.temperature parameter controls randomness—higher values (up to 1.0) produce more varied outputs, while lower values make responses more deterministic. The max_tokens parameter limits response length.
Model Preferences
Model preferences let you hint at which LLM the client should use for a request. You can pass a single model name or a list of preferences in priority order. These are hints rather than requirements—the actual model used depends on what the client has available.Multi-Turn Conversations
For requests that need conversational context, construct a list ofSamplingMessage objects representing the conversation history. Each message has a role (“user” or “assistant”) and content (a TextContent object).
Fallback Handlers
Client support for sampling is optional—some clients may not implement it. To ensure your tools work regardless of client capabilities, configure asampling_handler that sends requests directly to an LLM provider.
FastMCP provides built-in handlers for OpenAI and Anthropic APIs. These handlers support the full sampling API including tools, automatically converting your Python functions to each provider’s format.
Install handlers with
pip install fastmcp[openai] or pip install fastmcp[anthropic].sampling_handler_behavior parameter controls when the handler is used:
"fallback"(default): Use the handler only when the client doesn’t support sampling. This lets capable clients use their own LLM while ensuring your tools still work with clients that lack sampling support."always": Always use the handler, bypassing the client entirely. Use this when you need guaranteed control over which LLM processes requests—for cost control, compliance requirements, or when specific model characteristics are essential.
Structured Output
New in version2.14.1
When you need validated, typed data instead of free-form text, use the result_type parameter. FastMCP ensures the LLM returns data matching your type, handling validation and retries automatically.
The result_type parameter accepts Pydantic models, dataclasses, and basic types like int, list[str], or dict[str, int]. When you specify a result type, FastMCP automatically creates a final_response tool that the LLM calls to provide its response. If validation fails, the error is sent back to the LLM for retry.
result.result, while result.text contains the JSON representation.
Structured Output with Tools
Combine structured output with tools for agentic workflows that return validated data. The LLM uses your tools to gather information, then returns a response matching your type.Structured output with automatic validation only applies to
sample(). With sample_step(), you must manage structured output yourself.Tool Use
New in version2.14.1
Sampling with tools enables agentic workflows where the LLM can call functions to gather information before responding. This implements SEP-1577, allowing the LLM to autonomously orchestrate multi-step operations.
Pass Python functions to the tools parameter, and FastMCP handles the execution loop automatically—calling tools, returning results to the LLM, and continuing until the LLM provides a final response.
Defining Tools
Define regular Python functions with type hints and docstrings. FastMCP extracts the function’s name, docstring, and parameter types to create tool schemas that the LLM can understand.Custom Tool Definitions
For custom names or descriptions, useSamplingTool.from_function():
Error Handling
By default, when a sampling tool raises an exception, the error message (including details) is sent back to the LLM so it can attempt recovery. To prevent sensitive information from leaking to the LLM, use themask_error_details parameter:
mask_error_details=True, tool errors become generic messages like "Error executing tool 'search'" instead of exposing stack traces or internal details.
To intentionally provide specific error messages to the LLM regardless of masking, raise ToolError:
ToolError messages always pass through to the LLM, making it the escape hatch for errors you want the LLM to see and handle.
Concurrent Tool Execution
By default, tools execute sequentially — one at a time, in order. When your tools are independent (no shared state between them), you can execute them in parallel withtool_concurrency:
tool_concurrency parameter controls how many tools run at once:
None(default): Sequential execution0: Unlimited parallel executionN > 0: Execute at most N tools concurrently
sequential when creating the SamplingTool:
When any tool in a batch has
sequential=True, the entire batch executes sequentially regardless of tool_concurrency. This is a conservative guarantee — if one tool needs ordering, all tools in that batch respect it.Client Requirements
Sampling with tools requires the client to advertise the
sampling.tools capability. FastMCP clients do this automatically. For external clients that don’t support tool-enabled sampling, configure a fallback handler with sampling_handler_behavior="always".Advanced Control
New in version2.14.1
While sample() handles the tool execution loop automatically, some scenarios require fine-grained control over each step. The sample_step() method makes a single LLM call and returns a SampleStep containing the response and updated history.
Unlike sample(), sample_step() is stateless—it doesn’t remember previous calls. You control the conversation by passing the full message history each time. The returned step.history includes all messages up through the current response, making it easy to continue the loop.
Use sample_step() when you need to:
- Inspect tool calls before they execute
- Implement custom termination conditions
- Add logging, metrics, or checkpointing between steps
- Build custom agentic loops with domain-specific logic
Basic Loop
By default,sample_step() executes any tool calls and includes the results in the history. Call it in a loop, passing the updated history each time, until a stop condition is met.
SampleStep Properties
EachSampleStep provides information about what the LLM returned:
| Property | Description |
|---|---|
step.is_tool_use | True if the LLM requested tool calls |
step.tool_calls | List of tool calls requested (if any) |
step.text | The text content (if any) |
step.history | All messages exchanged so far |
step.history depend on execute_tools:
execute_tools=True(default): Includes tool results, ready for the next iterationexecute_tools=False: Includes the assistant’s tool request, but you add results yourself
Manual Tool Execution
Setexecute_tools=False to handle tool execution yourself. When disabled, step.history contains the user message and the assistant’s response with tool calls—but no tool results. You execute the tools and append the results as a user message.
isError=True on the tool result:
Method Reference
ctx.sample()
Request text generation from the LLM, running to completion automatically.
ctx.sample_step()
Make a single LLM sampling call. Use this for fine-grained control over the sampling loop.

