Design Effective Tool Interfaces with Clear Descriptions and Boundaries
Why Tool Descriptions Are the Most Important Lever
When an LLM needs to decide which tool to invoke, the tool description is the primary signal it uses to make that selection. Unlike prompt instructions that nudge behavior probabilistically, tool descriptions directly determine how the model understands each tool's purpose, scope, and appropriate use cases. If two tools have vague or overlapping descriptions, the model will routinely misroute requests between them — no amount of prompt tuning will compensate for poorly differentiated tool metadata.
The Problem with Minimal Descriptions
Consider two tools with terse descriptions: get_customer — "Retrieves customer information"
and lookup_order — "Gets order details." Both accept similar identifier formats (names,
IDs, order numbers). When a user says "check on order #12345," the model may call
get_customer instead because both descriptions are too sparse to establish
clear boundaries. The model lacks the contextual information needed to reliably distinguish
which tool handles which class of request.
What a High-Quality Tool Description Contains
- Accepted input formats — specify exactly what identifiers, data types, and patterns the tool expects
- Example queries — include 2-3 representative requests that should trigger this tool
- Edge cases and limitations — describe what the tool does not do, preventing misuse
- Boundary explanations — explicitly distinguish this tool from similar ones ("Use this for order lookups by order ID or tracking number, NOT for customer profile retrieval")
Ambiguous and Overlapping Descriptions Cause Misrouting
A common failure pattern arises when two tools share overlapping semantic territory.
For example, analyze_content and analyze_document without
clear differentiation will confuse the model. The fix is not to add more prompt instructions —
it is to rename tools so their purpose is unambiguous, or to split a single generic tool
into multiple purpose-specific tools with narrow, non-overlapping descriptions.
Hidden Risks: Keyword-Sensitive System Prompt Instructions
Including tool-selection hints in the system prompt (e.g., "always use analyze_content when the user mentions 'review'") can create unintended associations. The model may over-index on the keyword and trigger the tool even when the user's intent clearly points elsewhere. Tool selection should be driven by the tool definitions themselves, not fragile keyword rules injected into the prompt.
Practical Skills for This Task
- Rename tools to eliminate semantic overlap (e.g., rename
analyze_contenttoanalyze_text_sentiment) - Split a generic multi-purpose tool into several purpose-specific tools with tight descriptions
- Write descriptions that include input formats, example queries, scope boundaries, and negative constraints
Tool descriptions are the #1 lever for reliable tool selection. When models misroute between tools, expanding and differentiating tool descriptions is the highest-impact, lowest-cost fix — before adding few-shot examples, routing layers, or model upgrades.
Wrong approach: Making tool descriptions longer without actually differentiating between similar tools. Simply adding more words to a description doesn't help if both tools still sound interchangeable. Length is not the goal — clear semantic boundaries between tools are what matters.
Your production agent keeps calling get_customer instead of
lookup_order when users ask about their orders. Both tools have
one-sentence descriptions and accept similar identifier formats. What is the
most effective first step to fix this?
Implement Structured Error Responses for MCP Tools
The MCP isError Pattern
The Model Context Protocol defines a standard way for tools to communicate failure
back to the calling agent: the isError flag. When a tool execution fails,
it should return a response with isError: true along with structured metadata
that enables the agent to understand what went wrong, why it failed,
and whether retrying could succeed. This is fundamentally different from simply
returning a text message like "Operation failed."
Error Categories You Must Distinguish
Not all errors are the same, and treating them uniformly prevents intelligent recovery. A well-designed MCP tool classifies errors into distinct categories:
- Transient errors — timeouts, rate limits, temporary network failures. These are retryable and may succeed on a subsequent attempt.
- Validation errors — malformed input, missing required fields, type mismatches. The agent needs to fix the input before retrying.
- Business rule violations — policy limits exceeded, unauthorized operations, logical constraints violated. Retrying with the same parameters will always fail.
- Permission errors — insufficient access rights, expired tokens, forbidden resources. Requires credential refresh or escalation, not retry.
Why Generic Errors Are Dangerous
When every failure returns a uniform "Operation failed" message, the agent cannot make informed recovery decisions. Should it retry? Should it reformulate the request? Should it escalate to a human? Without error categorization, the agent either retries blindly (wasting resources on non-retryable failures) or gives up immediately (missing easy recoveries from transient issues).
Structured Error Metadata
An effective error response includes multiple pieces of actionable information:
isError: true— the MCP standard flag signaling a failure occurrederrorCategory— one of "transient", "validation", "business", "permission"isRetryable: boolean— explicitly states whether the agent should attempt again- A human-readable description that explains the failure in enough detail for the agent to adjust
- Contextual metadata (which field failed validation, what policy was violated, how long to wait before retry)
Error Recovery in Multi-Agent Systems
In hub-and-spoke architectures, error handling follows a locality principle: subagents should attempt local recovery for transient failures (retries, alternative queries) and only propagate errors to the coordinator when they cannot be resolved locally. When propagating, the subagent should include partial results alongside the error context, so the coordinator can decide whether to retry with a different strategy, use a different subagent, or proceed with what's available.
Structured error responses enable intelligent agent recovery decisions. The combination of
isError, errorCategory, and isRetryable gives
the agent a decision framework: retry transient failures, fix validation issues, escalate
permission problems, and report business rule violations.
Wrong approach: Returning a generic "Operation failed" message for all error types. This prevents the agent from distinguishing between a temporary timeout (retry it) and a permanent permission denial (escalate it). Always categorize errors with structured metadata.
Wrong approach: Silently returning empty results when
an access failure occurs. This disguises a failure as a legitimate "no data found" response.
The agent will proceed as if the query returned nothing, when in reality the data was
inaccessible. Always distinguish access failures
(isError: true) from genuine empty results (isError: false).
A web search subagent times out while researching a complex topic. You need to design how failure information propagates back to the coordinator. Which error propagation approach best supports intelligent recovery?
Distribute Tools Across Agents and Configure Tool Choice
Why Too Many Tools Degrade Selection Reliability
When a single agent has access to a large number of tools (18 or more), its ability to reliably pick the right tool for each request drops significantly. The model must evaluate every tool description against the current request, and as the list grows, semantic overlap increases and selection confidence decreases. Studies of agentic systems consistently show that agents with focused tool sets of 4-5 tools dramatically outperform those with sprawling toolkits.
Agents Misuse Tools Outside Their Specialization
Giving an agent tools that fall outside its core role creates a temptation for misuse. A customer service agent equipped with database administration tools may attempt direct data modifications when it should be routing through safe, business-logic-aware endpoints. The principle of scoped tool access dictates that each agent should only have access to the tools that are necessary and appropriate for its designated role.
The Distribution Strategy
Rather than loading every tool onto one monolithic agent, distribute tools across a team of specialized subagents:
- A coordinator agent with 4-5 high-level tools (routing, delegation, aggregation)
- Specialized subagents, each with 4-5 tools focused on a narrow domain (e.g., order management, customer lookup, inventory queries)
- Context passed to each subagent should be minimal and task-specific — not the coordinator's full conversation history
Understanding tool_choice Configuration
The tool_choice parameter controls how the model decides whether and which
tool to call:
"auto"— the model decides on its own whether to call a tool or respond with text. It may choose not to use any tool if it determines the request can be answered directly."any"— the model must call a tool; it cannot respond with plain text. Use this when you need guaranteed tool invocation but want the model to choose which tool.- Forced / specific tool — the model must call one particular tool. Use this for workflows where the next step is deterministic (e.g., always validate output through a specific schema tool).
Keep 4-5 tools per agent for optimal selection reliability. When your system needs more tools, distribute them across specialized subagents rather than overloading a single agent. This is a structural solution — not a description-tuning problem.
Wrong approach: Assigning 18+ tools to a single agent and attempting to fix selection errors by writing longer or more detailed descriptions. Tool description quality matters, but it cannot overcome the fundamental problem of cognitive overload from too many options. The correct fix is to split tools across multiple focused agents.
Your developer productivity agent has 18 tools and frequently selects the wrong one. You've already ensured all tool descriptions are detailed and differentiated. What is the most effective next step?
Integrate MCP Servers into Claude Code and Agent Workflows
MCP Server Scoping: Project-Level vs User-Level
MCP servers can be configured at two distinct levels, and choosing the right scope matters for team collaboration and security:
- Project-level (
.mcp.json) — lives in the project repository, committed to version control. Every developer who clones the repo gets the same MCP server configuration. Ideal for team-shared integrations like database connectors, CI systems, or project-specific APIs. - User-level (
~/.claude.json) — lives in the developer's home directory. Personal MCP servers that apply across all projects (personal productivity tools, individual API keys for development sandboxes).
Environment Variable Expansion for Credential Management
MCP configuration files support environment variable expansion using the
${VARIABLE_NAME} syntax. This is the correct way to reference secrets
and API keys in configuration files that will be committed to version control:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_TOKEN": "${GITHUB_TOKEN}"
}
}
}
}
The actual token value is stored in the developer's environment (e.g., via .env
files, shell profiles, or secret managers) — never in the configuration file itself.
MCP Resources: Reducing Exploratory Tool Calls
MCP servers can expose resources — read-only content catalogs that the agent can browse without making tool calls. Instead of the agent making repeated exploratory calls to discover what's available ("list all tables," "show all endpoints"), resources provide this information upfront as structured content. This reduces round-trips, lowers latency, and keeps the agent's context focused on its actual task.
Community Servers vs Custom Implementations
For standard integrations (GitHub, Slack, databases, file systems), prefer established community MCP servers over building custom implementations. Community servers are battle-tested, maintained by the ecosystem, and follow MCP best practices. Reserve custom server development for proprietary APIs and unique business logic that no community server addresses.
Always use ${ENV_VAR} expansion in .mcp.json for credentials.
Never hardcode secrets in configuration files that are committed to version control.
Project-level config (.mcp.json) is for shared team integrations;
user-level config (~/.claude.json) is for personal tools.
Wrong approach: Hardcoding API keys directly in
.mcp.json. Since this file is committed to version control, secrets
will be exposed to anyone with repository access.
Always use ${ENV_VAR} expansion and
store actual values in environment variables or a secret manager.
Your team needs to integrate a GitHub MCP server into a shared project repository. The server requires a personal access token for authentication. Which configuration approach is correct?
${ENV_VAR} pattern in .mcp.json provides the best of
both worlds: the server configuration is shared through version control, while actual
secrets stay in each developer's environment. Option A exposes secrets in the repo;
option B loses the benefit of shared configuration; option D introduces a non-standard
pattern that MCP doesn't natively support.
Select and Apply Built-in Tools Effectively
The Built-in Tool Suite
Claude Code and the Agent SDK provide a set of built-in tools that cover fundamental file and code operations. Knowing exactly when to use each tool — and when not to — is essential for building efficient, reliable agent workflows.
Grep: Content Search by Pattern
Use Grep when you need to search for content inside files. This is the right tool for finding function names, error messages, configuration values, import statements, API endpoints, or any text pattern within the codebase. Grep searches through file contents using patterns and regular expressions.
Glob: File Path Pattern Matching
Use Glob when you need to find files by their name, extension, or
directory structure — without looking inside them. Glob matches file paths
against patterns like *.config.js, src/components/**/*.tsx,
or **/test_*.py. It answers "which files exist that match this pattern?"
rather than "which files contain this text?"
Read and Write: Full File Operations
Read loads the full content of a file. Use it when you need to examine an entire file's contents, understand its structure, or review its code. Write creates a new file or completely overwrites an existing one. Be cautious with Write on existing files — anything not included in the new content will be permanently lost.
Edit: Targeted Modifications
Edit makes targeted changes to specific sections of a file using unique text matching. It identifies the exact location to modify by matching a unique string, then replaces it with the new content. The key requirement is that the text to be replaced must be unique within the file. When Edit fails because the match text isn't unique, the fallback strategy is to use Read to get the full file, then Write the complete modified version.
Building Codebase Understanding Incrementally
Effective agents don't try to read an entire codebase at once. Instead, they build understanding incrementally using a disciplined tool chain:
- Start with Grep to find entry points (main functions, route definitions, exported symbols)
- Use Read to examine the entry point files and follow import chains
- Use Glob to discover related files by naming conventions or directory patterns
- Use Grep again to trace specific function calls or references across the codebase
Know exactly when to use each built-in tool. Grep = content search inside files. Glob = file path pattern matching. Read = load file contents. Write = create or overwrite files. Edit = targeted modifications using unique text matching. Bash = system commands and processes (never use it when a dedicated tool exists for the task).
Wrong approach: Using Bash('cat config.json')
to read a configuration file. When a dedicated Read tool exists, always
prefer it over shell commands.
Reserve Bash for operations that genuinely require shell execution
(running tests, installing packages, git operations) — never for file reading, writing,
or searching when built-in tools exist.
You need to read the contents of a project's config.yaml file to
understand the application's database settings. Which tool should you use?
Bash('cat ...') (B) works but is an anti-pattern when a dedicated
tool exists. Grep (C) searches for patterns within files, not reads the whole file.
Glob (D) finds files by path pattern — useful if you don't know the exact location,
but the question specifies the path is known.