-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add examples
or similar to the standard schema
#125
Comments
Thanks for the detailed proposal! I think our general guiding philosophy here could be simplistically described as, "LLMs are crazy good, and will only get better." In that world, we probably want to prefer fewer different fields and less explicit structure, and allow the LLM to analyze and extract whatever structure it needs on its own (i.e., putting everything in an unstructured This particular case about examples isn't too contentious on its own, but we'd want to avoid a proliferation like this across the spec (see also: resource descriptions, top-level server For these reasons, I'm inclined not to define additional fields here. Open to thoughts, though! |
I agree that populating the spec with additional fields could get messy. With that in mind, here are some ideas. Proposed Change 1 (PC1): Change server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
{
name: "some-tool",
context:`
<DESCRIPTION>
This is a tool description
</DESCRIPTION>
<EXAMPLES>
<CORRECT>
<EX:1>some correct example</EX:1>
<EX:2>second correct example</EX:2>
</CORRECT>
<INCORRECT>
// something
</INCORRECT>
</EXAMPLES>`,
inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
}
],
};
}); Issues with PC1 There are two considerations at play here:
Which combine into a single issue: Proposed Change 2 (PC2): ToolContext's const server = new Server(
{
name: "some-mcp",
version: "0.0.1"
},
{
capabilities: {
resources: {},
tools: {},
prompts: {}
},
toolAdapter: anthropicAdapter // validates tool context requirements
}
);
// Developer provides context according to model requirements
const toolContext = {
description: "..."
// other model-specific fields
}
const someTool = {
name: "some-tool",
context: toolContext,
inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
};
// Type safety in action
server.setRequestHandler(ListToolsRequestSchema, async () => {
return {
tools: [
someTool // has description, no type error
],
};
}); Here, Benefits of PC2
The additional benefit of allowing model providers to specify their own tool context requirements is that they can experiment with different information and tailor their models to work best with a specific set of tools. Over time, one might imagine that various providers will identify an effective information structure for tool consumption, and the information structures could converge to form a standard. Issues with PC2 Proposed Change 3 (PC3): Actually just PC1 again // Example helper from model provider
const toolContext = anthropicAdapter.buildToolContext({
description: "...",
// other model-specific fields that get validated
});
/*
toolContext = `
<DESCRIPTION>
This is a tool description
</DESCRIPTION>
*/
const someTool = {
name: "some-tool",
context: toolContext, // Returns formatted string
inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
}; This solution:
Issues with PC3 Proposed Change 4 (PC4): Set the standard Issues with PC4 Proposed Change 5 (PC5): Porque no los dos? Issues with PC5 Conclusion
I would recommend PC5 or PC1. In a perfect world, PC4 is probably best, but maybe it is a bit too soon for that kind of bureaucratic decision making. I think that 2-3 all promote patterns that lead to some models performing better, and are therefore unviable. Of course, the last option is to not change anything at all. I find 1/5 both preferable to that. A final note about the "guiding philosophy" I hope the design of the protocol isn't banking on the continual improvement of models, but rather, providing the best possible interface to safely, securely, and effectively develop collections of tools for LLMs to use. That is to say, the protocol is a tool that helps us address our problems as the people that have to build interactions. No amount of model improvement is going to move the needle on that. The structure of the information we decide as a standard will. No reason to rely on compute to solve a problem that can be addressed with design. |
We definitely don't want to couple anything in the spec or SDKs directly to model providers (or require their SDKs to inject information/structure for MCP to be usable). Thinking about it some more, perhaps the strongest case in favor of adding an |
Is your feature request related to a problem? Please describe.
I frequently encounter issues where, even with a proper input schema, the model I am using (Sonnet 3.5) tries to use the tool incorrectly. As recommended by the MCP documentation, I have solved this problem by including usage examples in the tool description. However, this is a conflation of purpose. Unstructured prompts containing semantic information shouldn't be needed to create effective, usable tools.
Describe the solution you'd like
Instead of encouraging developers to include examples in the
description
field, there should be anexamples
field that contains instructive examples. I prefer to get even more specific and promote the use ofcorrect
andincorrect
subfields, each containing positive and negative examples, respectively.Or alternatively:
Though many structures could ultimately achieve the same goal.
Describe alternatives you've considered
What is the issue with using the description field and including examples there?
Mainly separation of concerns. LLMs should use the description field to decide what tools they should use. By providing an examples field to define how each tool should be used, we can separate the semantic information of the description from the syntactic information of how it should be used.
Additionally, having a standard, structured means of nudging models to correct usage has benefits. It serves as self-documenting code, indicating that the model has trouble accomplishing certain outcomes. It allows for iterative squashing of LLM errors; for any given LLM mistake, it can be used as a negative example, and developers can provide a positive example.
Is there a better place to put this information?
Zod schema
In some ways, it would make sense to provide it as field descriptions on the Zod schema since field errors can provide feedback to the LLM. But that violates the independence of the schema in the same way. The Zod schema is for guaranteeing that the request is well formed, not directing the LLM on how to form the response well.
Generic
instructions
field that takes a stringWhile this would be fine, it also is more loosely structured than would necessarily be helpful. In my current workflow for developing MCPs, I write the code, run a build, restart the Claude desktop app, run a prompt to test all the tools and document any errors. The most frequent error, outside of my own, is the LLM incorrectly providing parameters. When I find an error of this kind, I check what the LLM generated, produce a fixed version to serve as a correct example, and then provide the original as an incorrect example. Standardizing a way of providing examples as part of the tool structure would make this easier and potentially programmatic. One could imagine using LLMs to create structured outputs for correct and incorrect examples with context.
A generic
instructions
field kicks the can down the road. A structured solution is necessary.Additional context
My suggested solution is prescriptive, but I don't know the best solution at this moment. The primary issue is that LLMs can have difficulty using tools correctly without some extra context. This can be solved by providing examples in a description field. But if omitting examples regularly results in errors, and providing examples is recommended, then it makes sense to integrate example provision into the protocol.
The text was updated successfully, but these errors were encountered: