Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo - remove root namespace from system-specific attributes and metrics #1711

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Dec 23, 2024

Demonstrates #1708 (comment) and #1618 (comment)

  • we're inconsistent when naming system-specific attributes and metrics. E.g. we have
    • jvm.* and not runtime.jvm.*
    • db.cosmosdb and not cosmosdb
    • messaging.kafka and not kafka
    • gcp and not cloud.gcp
    • ...
  • Any system, even if designed for a specific domain, is wider that this domain. E.g.
    • redis is mostly used as a cache, we call it a database, but it's also a messaging system - would we provide cache.redis, db.redis and messaging.redis?
    • Azure EventHubs is a messaging system, but we need metrics for connection management, load balancing, etc - having everything in eventhubs (or az.eventhubs) makes more sense.

This PR shows how the world would look like if we (in general) removed domain prefix from all system-specific things.

The naming guidance would be:

  • if metric/event/attribute describes a general concept applicable to multiple system within this domain, it should start with this domain (e.g. db.namespace or db.client.operation.duration)
  • if metrics/event/attribute is specific to a certain system, it should start with this system name (e.g. jvm.cpu.time, cosmosdb.client.request_charge)

| [`db.namespace`](/docs/attributes-registry/db.md) | string | The name of the Elasticsearch cluster which the client connects to. [10] | `customers`; `test.users` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.operation.batch.size`](/docs/attributes-registry/db.md) | int | The number of queries included in a batch operation. [11] | `2`; `3`; `4` | `Recommended` | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`db.query.text`](/docs/attributes-registry/db.md) | string | The request body for a [search-type query](https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html), as a json string. [12] | `"{\"query\":{\"term\":{\"user.id\":\"kimchy\"}}}"` | `Recommended` [13] | ![Release Candidate](https://img.shields.io/badge/-rc-mediumorchid) |
| [`elasticsearch.node.name`](/docs/attributes-registry/elasticsearch.md) | string | Represents the human-readable identifier of the node/instance to which a request was routed. [14] | `instance-0000000001` | `Recommended` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what Elastic folks think

@AlexanderWert @gregkalapos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elasticsearch.node.name seems ok to me. I think for Elasticsearch the proposed change would not make a significant difference, because it naturally falls under the db domain - so there was no discrepancy there anyways. E.g. the redis example in the PR description shows the problem very well - I'd say we haven't had this issue with Elasticsearch - definitely not to that extent.

So, having said that, I'd say, for consistency (if we go with this proposal and drop it due to other use-cases, e.g. redis), it'd be ok to drop db. for the Elasticsearch specific attributes.

| [`db.cosmosdb.regions_contacted`](/docs/attributes-registry/db.md) | string[] | List of regions contacted during operation in the order that they were contacted. If there is more than one region listed, it indicates that the operation was performed on multiple regions i.e. cross-regional call. [3] | `["North Central US", "Australia East", "Australia Southeast"]` | `Conditionally Required` If available. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`db.cosmosdb.request_charge`](/docs/attributes-registry/db.md) | double | Request units consumed for the operation. | `46.18`; `1.0` | `Conditionally Required` when available | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`db.cosmosdb.sub_status_code`](/docs/attributes-registry/db.md) | int | Cosmos DB sub status code. | `1000`; `1002` | `Conditionally Required` when response was received and contained sub-code. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`cosmosdb.connection_mode`](/docs/attributes-registry/cosmosdb.md) | string | Cosmos client connection mode. | `gateway`; `direct` | `Conditionally Required` [1] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what CosmosDB folks would think about this (removing db in front of cosmosdb everywhere)

@jcocchi @sourabh1007

| [`messaging.batch.message_count`](/docs/attributes-registry/messaging.md) | int | The number of messages sent, received, or processed in the scope of the batching operation. [2] | `0`; `1`; `2` | `Conditionally Required` [3] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`messaging.destination.name`](/docs/attributes-registry/messaging.md) | string | The message destination name [4] | `MyQueue`; `MyTopic` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`messaging.kafka.message.tombstone`](/docs/attributes-registry/messaging.md) | boolean | A boolean that is true if the message is a tombstone. | | `Conditionally Required` [6] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`kafka.message.tombstone`](/docs/attributes-registry/kafka.md) | boolean | A boolean that is true if the message is a tombstone. | | `Conditionally Required` [2] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what messaging folks think (about removing messaging prefix from system-specific attributes and metrics)

@joaopgrassi @pyohannes @carlosalberto @trask

| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [9] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`server.address`](/docs/attributes-registry/server.md) | string | Server domain name if available without reverse DNS lookup; otherwise, IP address or Unix domain socket name. [8] | `example.com`; `10.1.2.80`; `/tmp/my.sock` | `Conditionally Required` If available. | ![Stable](https://img.shields.io/badge/-stable-lightgreen) |
| [`servicebus.disposition_status`](/docs/attributes-registry/servicebus.md) | string | Describes the [settlement type](https://learn.microsoft.com/azure/service-bus-messaging/message-transfers-locks-settlement#peeklock). | `complete`; `abandon`; `dead_letter` | `Conditionally Required` if and only if `messaging.operation` is `settle`. | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`servicebus.message.delivery_count`](/docs/attributes-registry/servicebus.md) | int | Number of deliveries that have been attempted for this message. | `2` | `Conditionally Required` [9] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

@lmolkova lmolkova Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the owner of the corresponding instrumentations, I'd be really excited about this change - I struggle with trying to fit everything about specific messaging lib/system into messaging namespace.

| [`gen_ai.openai.request.seed`](/docs/attributes-registry/gen-ai.md) | int | Requests with same seed value more likely to return same result. | `100` | `Conditionally Required` if the request includes a seed | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`gen_ai.openai.request.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier requested. May be a specific tier, default, or auto. | `auto`; `default` | `Conditionally Required` [4] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`gen_ai.openai.response.service_tier`](/docs/attributes-registry/gen-ai.md) | string | The service tier used for the response. | `scale`; `default` | `Conditionally Required` [5] | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`openai.request.response_format`](/docs/attributes-registry/openai.md) | string | The response format that is requested. | `json` | `Conditionally Required` if the request includes a response_format | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious what gen_ai folks think about removing gen_ai prefix from system-specific conventions like openai.

E.g. what good does gen_ai prefix does for something like file upload - would we want to report gen_ai.client.openai.uploaded_files vs openai.client.uploaded_files metric?

@open-telemetry/semconv-genai-approvers @xrmx @aabmass @codefromthecrypt @trentm @lzchen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm gut feel is that we want the dimension/correlation of the source type vs embed it inside a naming convention, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above comment was about the attribute in general, but now noticing your question was limited to if it should be prefixed or not (nothing else changes except the prefixing).

Answer is similar, I think the attribute name should be owned by the blob upload component, not gen_ai. However, if this is about bascially all openai related properties becoming top-level and not being under gen_ai, main concern is it crowds the top-level namespace.

  • This puts a lot more pressure to pick names top-level names that are not ambiguous across domains. This might work for trademarked names better than non-trademarked ones.
  • This can help when other types of attributes are considered, like do we feel like saying GENAI_OPENAI_API_VERSION vs OPENAI_API_VERSION
  • semconv ownership is easier to figure out when it is prefixed, e.g. this team owns openai attributes, however, this is maybe not very useful to end users vs us

Gut feel is that removing the top-level namespaces for trademarked names, ones that don't have registry clash risk, sounds like it helps more than hurts.

I know it is a different issue, but in any text, most of these attributes are about the api or library in use, not necessarily the LLM provider (e.g xAI provider has openai compat support and would use these attributes for those traffic). We would need to keep that in mind when attributing to what's currently called "system". This clarification may not end up in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a question whether gen_ai prefix add any value for any attribute that describes something specific to a certain library/system/api/provider/whatever. If something is specific to openai API, we'd still use openai somewhere in the name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does add some value in hinting that the span or event contains information pertinent to the topic of generative AI. Without the category prefix, it is harder to do things like "Show users all spans that involved the use of GenAI".

| [`rpc.grpc.status_code`](/docs/attributes-registry/rpc.md) | int | The [numeric status code](https://github.com/grpc/grpc/blob/v1.33.2/doc/statuscodes.md) of the gRPC request. | `0`; `1`; `2` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`rpc.grpc.request.metadata.<key>`](/docs/attributes-registry/rpc.md) | string[] | gRPC request metadata, `<key>` being the normalized gRPC Metadata key (lowercase), the value being the metadata values. [1] | `rpc.grpc.request.metadata.my-custom-metadata-attribute=["1.2.3.4", "1.2.3.5"]` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`rpc.grpc.response.metadata.<key>`](/docs/attributes-registry/rpc.md) | string[] | gRPC response metadata, `<key>` being the normalized gRPC Metadata key (lowercase), the value being the metadata values. [2] | `rpc.grpc.response.metadata.my-custom-metadata-attribute=["attribute_value"]` | `Opt-In` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
| [`grpc.status_code`](/docs/attributes-registry/grpc.md) | int | The [numeric status code](https://github.com/grpc/grpc/blob/v1.33.2/doc/statuscodes.md) of the gRPC request. | `0`; `1`; `2` | `Required` | ![Experimental](https://img.shields.io/badge/-experimental-blue) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth what would be your thoughts on getting rid of rpc. prefix from rpc.grpc? (and other similar changes in PR)

@trask
Copy link
Member

trask commented Jan 11, 2025

Any system, even if designed for a specific domain, is wider that this domain.

how big of a problem do we think this is, and are there other mitigations we could consider?

as we are looking at tradeoffs, it feels like there's some tradeoff to promoting all of these (many) database systems to top-level namespaces, over keeping them scoped under db.*

@lmolkova
Copy link
Contributor Author

lmolkova commented Jan 11, 2025

as we are looking at tradeoffs, it feels like there's some tradeoff to promoting all of these (many) database systems to top-level namespaces, over keeping them scoped under db.*

I think consistent naming is a problem regardless of the top-level namespace. We need to make a choice between db.ibm.db2.* and db.db2.* anyway (and I'd prefer db2).

@michaelsafyan
Copy link
Contributor

A potential concern with moving specific systems attributes outside the prefix of the general category to which they belong: it makes it difficult to know which attributes are topically relevant.

For example, let's suppose that you are implementing the display of telemetry collection data and you want to give users the ability to filter down to information that is focused on "Databases" or "GenAI" or etc.

Being able to map that to a prefix filter on "db." or a prefix filter on "gen_ai." makes it possible to provide such a topic-focused experience. If specific vendor implementation of a general concept are hoisted up a level where they are no longer prefixed with the category, then it makes it harder to match attributes related to the topic.

A potential way to address this concern might be to record in the OTel SemConv metadata information about the attributes that can connect them to the topics that they are about. For example:

 - key: `myvendorspecificdb.foo`
   categories:
    - databases

That would (albeit with significantly more effort) at least make it technically achievable to provide this kind of topical filtering. But it would still involve much more effort than having the category be a consistent prefix.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jan 16, 2025

@michaelsafyan

The filtering can also be applied in the following way:

  • look at gen_ai.system.name let's say it's foo.bar
  • if you care deeply about gen_ai, you can also show all attributes starting with foo.bar
  • you can also decide if there are some systems you care more about and show them in a certain way.

Qualifying an attribute as relevant to a domain is not trivial (and which domains):

  1. when you upload a file to VertexAI, would you call it gen_ai, db or maybe rag?
  2. when you do function call to search for embedding in a database - how would you classify those?

The fact that attribute name starts with gen_ai does not mean it's important (it can still describe some nitty-gritty detail about specific system) and the opposite is also not true. E.g. presence of error.type on gen_ai telemetry item is very important.


To summarize pros and cons:

Having the {domain} prefix has the following cons:

  • things that don't clearly fit into one domain, would be artificial. Would we use db.redis when talking to DB, cache.redis when talking to cache and messaging.redis when using redis as a messaging system and finally use redis when describing control-plane and cluster operations? Using just redis solves these problems
  • once we can have decentralized semantic conventions, root namespace becomes a governance entity (Google should be able to manage everything in gcp namespace instead of gen_ai.gcp.vertex_ai, messaging.gcp.pubsub and open set of other namespaces)

I'd argue that those are more important than topical filtering.

Given that we still provide consistency for *.system.name = foo and foo.* system-specific attributes, topical filtering can still be done.

@michaelsafyan
Copy link
Contributor

@lmolkova is it guaranteed that the value in gen_ai.system.name will always match the prefix of the vendor-specific attributes? And is it guaranteed that the vendor-specific attributes will always be accompanied by one of the more general category-specific attributes? [Is this stated anywhere explicitly in the spec?]

My concern is mainly having to keep abreast of all possible vendor-specific systems that belong to some domain. I agree that categorization is hard, but it would be more scalable if contributors to the conventions self-categorize.

If there were metadata about each attribute specifying relevant categories, it may be possible to specify more than one category, which would eliminate the concerns about having to choose the ideal category.

I don't think it is necessarily a blocker to the proposed renaming, but I did want to raise it to attention that the naming prefixes do provide a useful way of topical filtering that is a tradeoff/loss as a result of this transformation (and that, out of band of this change, it may be worth thinking about how topical filtering might be enabled scalably).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

5 participants