Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples or similar to the standard schema #125

Open
StevenStavrakis opened this issue Dec 22, 2024 · 3 comments
Open

Add examples or similar to the standard schema #125

StevenStavrakis opened this issue Dec 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@StevenStavrakis
Copy link

StevenStavrakis commented Dec 22, 2024

Is your feature request related to a problem? Please describe.

I frequently encounter issues where, even with a proper input schema, the model I am using (Sonnet 3.5) tries to use the tool incorrectly. As recommended by the MCP documentation, I have solved this problem by including usage examples in the tool description. However, this is a conflation of purpose. Unstructured prompts containing semantic information shouldn't be needed to create effective, usable tools.

Describe the solution you'd like

Instead of encouraging developers to include examples in the description field, there should be an examples field that contains instructive examples. I prefer to get even more specific and promote the use of correct and incorrect subfields, each containing positive and negative examples, respectively.

// ... other parts
examples: {
     correct: [
          {
                context: "simple",
                output: `{ "files": ["note.md"], "tags": ["project", "status"] }`
          },
          {
               context: "with hierarchy",
               output: `{ "files": ["note.md"], "tags": ["work/active", "priority/high"] }`
          }
     ],
     incorrect: [
          {
               context: "don't include # symbol",
               output: `{ "tags": ["#project"] }`
          }
     ]
}

Or alternatively:

examples: [
     {
          context: "Some context here",
          correct: "correct output here",
          incorrect: "incorrect output here"
     }
]

Though many structures could ultimately achieve the same goal.

Describe alternatives you've considered

What is the issue with using the description field and including examples there?
Mainly separation of concerns. LLMs should use the description field to decide what tools they should use. By providing an examples field to define how each tool should be used, we can separate the semantic information of the description from the syntactic information of how it should be used.

Additionally, having a standard, structured means of nudging models to correct usage has benefits. It serves as self-documenting code, indicating that the model has trouble accomplishing certain outcomes. It allows for iterative squashing of LLM errors; for any given LLM mistake, it can be used as a negative example, and developers can provide a positive example.

Is there a better place to put this information?
Zod schema
In some ways, it would make sense to provide it as field descriptions on the Zod schema since field errors can provide feedback to the LLM. But that violates the independence of the schema in the same way. The Zod schema is for guaranteeing that the request is well formed, not directing the LLM on how to form the response well.

Generic instructions field that takes a string
While this would be fine, it also is more loosely structured than would necessarily be helpful. In my current workflow for developing MCPs, I write the code, run a build, restart the Claude desktop app, run a prompt to test all the tools and document any errors. The most frequent error, outside of my own, is the LLM incorrectly providing parameters. When I find an error of this kind, I check what the LLM generated, produce a fixed version to serve as a correct example, and then provide the original as an incorrect example. Standardizing a way of providing examples as part of the tool structure would make this easier and potentially programmatic. One could imagine using LLMs to create structured outputs for correct and incorrect examples with context.

A generic instructions field kicks the can down the road. A structured solution is necessary.

Additional context

My suggested solution is prescriptive, but I don't know the best solution at this moment. The primary issue is that LLMs can have difficulty using tools correctly without some extra context. This can be solved by providing examples in a description field. But if omitting examples regularly results in errors, and providing examples is recommended, then it makes sense to integrate example provision into the protocol.

@StevenStavrakis StevenStavrakis added the enhancement New feature or request label Dec 22, 2024
@jspahrsummers
Copy link
Member

Thanks for the detailed proposal!

I think our general guiding philosophy here could be simplistically described as, "LLMs are crazy good, and will only get better." In that world, we probably want to prefer fewer different fields and less explicit structure, and allow the LLM to analyze and extract whatever structure it needs on its own (i.e., putting everything in an unstructured description string).

This particular case about examples isn't too contentious on its own, but we'd want to avoid a proliferation like this across the spec (see also: resource descriptions, top-level server instructions, etc). It also unfortunately makes client and server implementations that extra bit more work.

For these reasons, I'm inclined not to define additional fields here. Open to thoughts, though!

@StevenStavrakis
Copy link
Author

I agree that populating the spec with additional fields could get messy. With that in mind, here are some ideas.

Proposed Change 1 (PC1): Change description to context
Right now, the description field contains more information that just the description. It could contain examples, additional considerations, etc. As per your comment, one benefit of having a single field is that developers retain agency on what information is provided as context, and how that information is structured.
I propose changing the description field to a context field. This would allow developers to structure their tool descriptions more flexibly while aligning with established prompt engineering best practices, particularly the use of XML tags.

server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
        {
	        name: "some-tool",
	        context:`
	        <DESCRIPTION>
			This is a tool description
			</DESCRIPTION>
			<EXAMPLES>
			<CORRECT>
			<EX:1>some correct example</EX:1>
			<EX:2>second correct example</EX:2>
			</CORRECT>
			<INCORRECT>
			// something     
			</INCORRECT>
			</EXAMPLES>`,
			inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
        }
    ],
  };
});

Issues with PC1
It occurs to me, however, that if this were the case, there is no guarantee that a description would be provided at all. Since it is important enough to include as part of the MCP to begin with, I assume a description is necessary for any given LLM to be able to select the correct tool.
If PC1 is implemented, I could foresee inexperienced developers having serious problems getting their MCP to work from lack of providing a description for the LLM to use during selection.

There are two considerations at play here:

  1. MCP server developers want a fluid experience developing tools and a generally uncluttered spec
  2. LLMs have requirements for what kind of information they need to be able to effectively make use of tools

Which combine into a single issue:
The information that a developer should provide alongside a tool is best determined by the model, not the protocol

Proposed Change 2 (PC2): ToolContext's
To address this, we could have companies that produce models provide context schemas for tool use. This would allow model providers to specify exactly what information they need to effectively use tools while keeping the protocol itself clean.

const server = new Server(
      {
        name: "some-mcp",
        version: "0.0.1"
      },
      {
        capabilities: {
          resources: {},
          tools: {},
          prompts: {}
        },
        toolAdapter: anthropicAdapter // validates tool context requirements
      }
    );
    
// Developer provides context according to model requirements
const toolContext = {
    description: "..."
    // other model-specific fields
}

const someTool = {
    name: "some-tool",
    context: toolContext,
    inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
};

// Type safety in action
server.setRequestHandler(ListToolsRequestSchema, async () => {
  return {
    tools: [
        someTool // has description, no type error
    ],
  };
});

Here, anthropicAdapter ensures type safety by validating that the toolContext object contains all required fields. If a developer forgets to include required information, they'll get immediate feedback through TypeScript errors.

Benefits of PC2

  1. Supports both consumers of the protocol:
    • Model providers can specify exactly what information they need
    • Developers get clear guidance on what information to provide
  2. Keeps the protocol clean while ensuring necessary structure
  3. Provides compile-time safety through TypeScript integration

The additional benefit of allowing model providers to specify their own tool context requirements is that they can experiment with different information and tailor their models to work best with a specific set of tools. Over time, one might imagine that various providers will identify an effective information structure for tool consumption, and the information structures could converge to form a standard.

Issues with PC2
The expansion of the API just grows. The added structure means added mental overhead. And while you don't need to use any adapter for any given model, you might want to, which means you are now going to a model providers documentation for best implementation. That means your context might not work well for one model, but great on another.

Proposed Change 3 (PC3): Actually just PC1 again
This is more of a combination of the previous two than a reversion back to PC1. But, instead of including tool context schemas as part of the server, there would be a single context field that takes a string. Then, model providers could still provide adapter functions that take JS objects, type-check them, and convert them to XML (or whatever other format).

// Example helper from model provider
const toolContext = anthropicAdapter.buildToolContext({
    description: "...",
    // other model-specific fields that get validated
});

/*
toolContext = `
<DESCRIPTION>
This is a tool description
</DESCRIPTION>
*/

const someTool = {
    name: "some-tool",
    context: toolContext, // Returns formatted string
    inputSchema: zodToJsonSchema(SomeArgsSchema) as ToolInput,
};

This solution:

  1. Allows model providers to tailor custom tool context schemas and provide connectors to interact with the protocol (if they want)
  2. Still allows developers to provide custom context strings if they want
  3. Doesn't add anything to the protocol itself
  4. Remains un-opinionated as to what information should be provided alongside a tool
  5. Even if a particular model provider promotes a particular information architecture, the MCP should work roughly similar across multiple models

Issues with PC3
In this case, model agnosticism is still lost. Inevitably the MCP will work better with some models than with others.

Proposed Change 4 (PC4): Set the standard
The three previous changes have been suggested with the idea of not being overly prescriptive. But it is worth considering if that is a positive virtue in this case. There is a value in defining the standard schema for tool use information. There should probably be some kind of referendum from industry experts on what that schema would look like (as with any other standard).

Issues with PC4
You would have to try and make a lot of people happy at the same time which probably wouldn't happen

Proposed Change 5 (PC5): Porque no los dos?
Just retain the current description field and provide an additional and optional context field. I'm not sure why it took me so long to come to this. I think it is general hesitancy about adding fields. But this way you have a discrete field purpose built to ensure the model at least is getting a description, then an additional field that developers can do anything with, including the suggestions in PC3.

Issues with PC5
All in all, its fine. It doesn't feel neat or elegant, but it does seem to solve all the problems mentioned at the cost of adding a single field to the tool declaration schema.

Conclusion
That was a whole trip. I think the particularly insightful ideas were:

  • The protocol should not promote development patterns that favor particular models
  • The protocol should be designed with deference to both MCP developers and to model providers
  • There is some standard for what a model needs to use a tool (at least a description)

I would recommend PC5 or PC1. In a perfect world, PC4 is probably best, but maybe it is a bit too soon for that kind of bureaucratic decision making. I think that 2-3 all promote patterns that lead to some models performing better, and are therefore unviable. Of course, the last option is to not change anything at all. I find 1/5 both preferable to that.

A final note about the "guiding philosophy"
I think the guiding philosophy you've put forward is somewhat troubling. I know it was probably a severely limited version of the view, but I think it is worth addressing.

I hope the design of the protocol isn't banking on the continual improvement of models, but rather, providing the best possible interface to safely, securely, and effectively develop collections of tools for LLMs to use.

That is to say, the protocol is a tool that helps us address our problems as the people that have to build interactions. No amount of model improvement is going to move the needle on that. The structure of the information we decide as a standard will. No reason to rely on compute to solve a problem that can be addressed with design.

@jspahrsummers
Copy link
Member

We definitely don't want to couple anything in the spec or SDKs directly to model providers (or require their SDKs to inject information/structure for MCP to be usable).

Thinking about it some more, perhaps the strongest case in favor of adding an examples field would be to permit the client to strip parts or all of it out at will (e.g., if it can determine that the model doesn't need it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants