New in version 2.0.0Use this when you need to respond to server requests for LLM completions.MCP servers can request LLM completions from clients during tool execution. This enables servers to delegate AI reasoning to the client, which controls which LLM is used and how requests are made.
from fastmcp import Clientfrom fastmcp.client.sampling import SamplingMessage, SamplingParams, RequestContextasync def sampling_handler( messages: list[SamplingMessage], params: SamplingParams, context: RequestContext) -> str: """ Handle server requests for LLM completions. Args: messages: Conversation messages to send to the LLM params: Sampling parameters (temperature, max_tokens, etc.) context: Request context with metadata Returns: Generated text response from your LLM """ # Extract message content conversation = [] for message in messages: content = message.content.text if hasattr(message.content, 'text') else str(message.content) conversation.append(f"{message.role}: {content}") # Use the system prompt if provided system_prompt = params.systemPrompt or "You are a helpful assistant." # Integrate with your LLM service here return "Generated response based on the messages"client = Client( "my_mcp_server.py", sampling_handler=sampling_handler,)
When you provide a sampling_handler, FastMCP automatically advertises full sampling capabilities to the server, including tool support. To disable tool support for simpler handlers:
Copy
from mcp.types import SamplingCapabilityclient = Client( "my_mcp_server.py", sampling_handler=basic_handler, sampling_capabilities=SamplingCapability(), # No tool support)
Tool execution happens on the server side. The client’s role is to pass tools to the LLM and return the LLM’s response (which may include tool use requests). The server then executes the tools and may send follow-up sampling requests with tool results.
To implement a custom sampling handler, see the handler source code as a reference.