-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AI Agent Span Semantic Convention #1657
base: main
Are you sure you want to change the base?
Conversation
@lmolkova @lzchen @nirga @karthikscale3 @drewby this is the very draft version, we may need a long discussion for this, hope we can start from here. Please share your comments here, actually, I do not know if we want to put the ai agent semantic convention to same folder as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few general points:
- we should not create attributes that would be the same as existing gen_ai attributes. We should use those instead of defining agent ones by default
- we need to define everything in yaml and stay compatible with the schema
I have a draft here - microsoft#3 for OpenAI assistant-like API which covers a lot of similar things, PTAL
docs/ai-agent/ai-agent-spans.md
Outdated
|
||
| Attribute | Type | Description | Example | Requirement Level | Stability | | ||
| ------------------------------ | ------ | ------------------------------------------ | -------------------------------- | ----------------- | --- | | ||
| `ai_agent.agent.name` | string | Name of the agent. | `Researcher Bot` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are still genai agents, so I think this out should be gen_ai.agent.name
docs/ai-agent/ai-agent-spans.md
Outdated
| `ai_agent.agent.role` | string | Role assigned to the agent. | `Data Collector` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.workflow_name` | string | Name of the workflow the agent is part of. | `Data Processing Pipeline` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.model` | string | Underlying model powering the agent. | `gpt-4` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's agent model is different from gen_ai.request.model
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
docs/ai-agent/ai-agent-spans.md
Outdated
| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.workflow_name` | string | Name of the workflow the agent is part of. | `Data Processing Pipeline` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.model` | string | Underlying model powering the agent. | `gpt-4` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.tools` | array | List of tools available to the agent. | `["Web Scraper", "Analyzer"]` | Recommended | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how'd it be different from generic gen_ai tool?
hey there, I think this is a great starting point for a very hot topic right now. There is one major comment I have on that, and it's around the If that's correct and if I look at your examples, a workflow (and maybe even a task?) can be very long running, which is a currently unsolved piece of the otel specification, so maybe this need for AI Agents being modelled could help to be a driven force behind providing a beter specification for that, because I additionally see |
@svrnm Yes, this is the case, at least from my point of view, the Let me review microsoft#3 from @lmolkova first, and I will try to update my PR soon after some discussion on microsoft#3
@lmolkova yes, let me consolidate the agent attributes to gen_ai, but let me first go through you PR microsoft#3 first, thanks! |
@svrnm / @gyliu513 - Just wanted to add my 2 cents here. This is very much an issue and thanks for bringing it up. The OTEL instrumentation we have(at Langtrace) for frameworks like CrewAI, DSPy etc. runs into this issue from time to time where traces have, in occasions 100s of spans as part of the same trace. An option to flush spans in progress will be ideal for these scenarios so the user can see realtime feedback on the UI for ongoing agentic sessions. Having said that, a vast majority of the agents we are seeing(from our perspective) still work well with the existing capabilities. But, we definitely need to think about this sooner than later. |
docs/ai-agent/ai-agent-spans.md
Outdated
| ------------------------------ | ------ | ------------------------------------------ | -------------------------------- | ----------------- | --- | | ||
| `ai_agent.agent.name` | string | Name of the agent. | `Researcher Bot` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.role` | string | Role assigned to the agent. | `Data Collector` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.agent.backstory` | string | Background story or context for the agent. | `Specializes in web data mining` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, role
and backstory
are very crewAI concepts that may or may not be applicable to other frameworks. So maybe we should consider making them specific to a crewAI
namespace?
docs/ai-agent/ai-agent-spans.md
Outdated
|
||
| Attribute | Type | Description | Example | Requirement Level | Stability | | ||
| --------------------------- | ------- | ------------------------------------------------------------------------ | ---------------------------------- | ----------------- | --- | | ||
| `ai_agent.task.name` | string | Name of the task. | `Data Collection` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need a ai_agent.task.input
in addition to these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if we can consolidate it under gen_ai.system|user|tool|assistant.message
events rather than attirbutes
docs/ai-agent/ai-agent-spans.md
Outdated
| Attribute | Type | Description | Example | Requirement Level | Stability | | ||
| ------------------------ | ------ | -------------------------------------------- | ----------------- | ----------------- | --- | | ||
| `ai_agent.tool.name` | string | Name of the tool utilized by the agent. | `Web Scraper` | Required | ![Experimental](https://img.shields.io/badge/-experimental-blue) | | ||
| `ai_agent.tool.function` | string | Specific function or capability of the tool. | `Data Extraction` | Recommended | ![Experimental](https://img.shields.io/badge/-experimental-blue) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we also need a ai_agent.tool.input
in addition to these?
Nice first draft @gyliu513 . thanks for starting this. |
@gyliu513 I'm giving you unsolicited advice and being intentionally not specific to a change here, because I think more research would lead you to your own changes. That's always the best (in my mind). Hope it helps. One of the main gains we had in re-organizing the llm now genai sig to have a space in otel-contrib python was to be able to practice specs before committing to it. I have seen this in practice done in java and it helps quite a bit.
Another thing to guide is especially bookend timestamps sounds like a discussion that would have happened here in another domain (start_xx end_xx). Certainly, it happened way back in zipkin days with "cs" "cr" though these were separate events. Can you research some prior work in otel where a spec like this was accepted or denied? |
@codefromthecrypt good comment, thanks and happy holidays!
Can you please share more detail for your comment here? I was now reviewing microsoft#3 and this PR really helped a lot, I will probably update my PR soon after new year based on microsoft#3. |
@gyliu513 for this comment I made "How about an example PII washed feed from agentic cloud provider data, which would translate" What I mean is that we most of the time assume the data is coming from the application. Like we instrument langchain or something and spans and metrics are collected directly from the app. While I don't know what services exist, another way is cloud integration, where a platform is generating the signals. One example is AWS Bedrock, where you can get data regardless of what the developers do https://aws.amazon.com/blogs/mt/monitoring-generative-ai-applications-using-amazon-bedrock-and-amazon-cloudwatch-integration/ So, for this PR, I mean that if its scope is only for application instrumentation, then we should look at which frameworks we are considering and maybe a draft/experiment/proof of concept that exercises the specs you are making. Beyond that, if you are thinking about a specific cloud integration (I don't know if you are), some sample data or documentation on what that agentic feed looks like could help us translate if the semantic conventions here are valid for it or not. Does that help? If not you can also quiz me on slack, but anyway happy holidays! |
Fixed part of #1530
Changes
Please provide a brief description of the changes here.
Note: if the PR is touching an area that is not listed in the existing areas, or the area does not have sufficient domain experts coverage, the PR might be tagged as experts needed and move slowly until experts are identified.
Merge requirement checklist
[chore]