-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seeking input on proposal for "reference"-type attribute values. #1428
Comments
Thanks for the proposal! I think we really need some way to capture large data that, somewhere in the pipeline, can be uploaded to a storage that's better designed for such data. A few feedback points on the details:
In any case, I'm really supportive of this proposal with a caveat that large data should not be populated on span attributes and we should not encourage people to do it. |
Thanks for the support of the general direction lmolkova! Event body
I emphatically agree. I would like it to be possible to perform this kind of write-aside solution with any relevant signal type (not just attributes, not just in spans). That would include uploading a subfield of the event body (targeted using a JSON Path, JMESPath, or similar). Ideally, this would be accomplished with the data model and implementation.
Our team is fully supportive of following OTel conventions and encouraging others to follow OTel conventions, so I don't think there is any conflict here in terms of what we recommend to our customers. Our team will aim to optimize the cases that follow OTel conventions and will encourage our customers to adopt said conventions, because they align with the "well lit path" in our systems which we prioritize. If the convention says that this data should be placed in an event body rather than in span attributes, then that is what we will encourage and do. Original value
I agree with you that it ought to be configurable. Potential options might be to drop it entirely before sending to the telemetry destination or to truncate it (possibly with redaction) before sending along to the telemetry endpoint. The latter would make it easier for a telemetry backend to provide users with a preview of the content before linking to the full uploaded document. Attribute namingI think that there are a lot of different valid naming schemes. Some things which I consider important to the naming are:
Between the two options you suggested ( However, I am somewhat concerned that an attribute ending in Event NamingI was assuming that event names would remain unchanged and that event handlers should be able to gracefully handle missing expected attributes (and optionally add support for handling reference attributes in place of the original attributes). But perhaps I am missing something. Can you share more regarding why the event name would need to be changed? Is there a particular breakage to existing backends that you would expect this to cause? (Shouldn't backends already gracefully handle the case where expected attributes are absent?) |
fwiw, while we do have some legacy span attributes that may contain large values, we have since clarified that the intention of attributes is for indexing, not for storing large blobs of data. It should be possible to store attributes in a column store without concern that this will create problems with values measured in megabytes. Any solution that can avoid putting large values in attributes should be avoided, and we should focus on making these values event payloads. |
Agreed. This will help us deal with "legacy span attributes that may contain large values", by making them into smaller values (just a URI and possibly other metadata like a content-type) so that backends don't need to deal with the possibility of very large attribute values. [But, indeed, we should encourage a different approach for new data].
Agreed. However, it is worth noting that this doesn't magically solve all of the problems for all backends. Some backends may also have limits on the size of such payloads, too. [Especially where GenAI use cases begin introducing things like entire PDFs, images, etc. in prompt/response logging]. Since event payloads and attributes use a common data model ( |
Just checking back in ... what are the next steps here? |
Area(s)
area:new
Is your change request related to a problem? Please describe.
Some attributes may may be very large such as:
http.request.body.content
http.response.body.content
gen_ai.prompt
gen_ai.completion
exception.stacktrace
exception.stacktrace
In order to provide a very tight SLO and to provide consistent performance, an operations backend might provide very tight limits on the size of payloads/content/attributes. However, it may still be desirable when this is the case to be able to index most of the properties/attributes as well as to cross-link to an alternative storage solution (with a looser SLO) for full, large data.
In the OTel Collector Contrib repository, there is an open issue for a new connector component ("New component: Blob Attribute Uploader Connector") which attempts to resolve the above issue by proposing a connector that will, in the instrumentation/collector, do the following:
Upload the full attribute data to a blob storage system (such as Google Cloud Storage, Amazon S3, Azure Blob storage, or -- in the future -- any other blob storage backend supported by the CDK or that chooses to contribute to the connector).
Remove the original attribute that would otherwise be too large.
Inject attribute(s) that contain references to where the data was uploaded.
It is with respect to this last one, that we need an established standard / data model for how to represent the uploaded data. The "SetInMap" function in the "foreign attribute" internal library of the draft connector shows the solution currently being implemented/pursued. However, I'd like to get more community feedback/input in order to ensure that there is agreement on the approach.
Describe the solution you'd like
Summary:
somekey
somekey.ref.uri
somekey.ref.content_type
[OPTIONAL]Formally:
A backend system that sees an attribute matching the pattern "${prefix}.ref.uri" should assume that "${prefix}.ref.uri" contains a URI that reports that location of the original, full value of "${prefix}". If a key named "${prefix}" also is present, it is likely that "${prefix}" contains a truncated and/or redacted copy of the original value, with "${prefix}.ref.uri" pointing to the location where the original, unadulterated, full version of the value has been recorded. If "${prefix}.ref.uri" exists, then "${prefix}.ref.content_type" may also optionally exist, containing a MIME type describing the data stored at the URI in "${prefix}.ref.uri".
Prerequisites / Interactions:
If this course of action were to be taken, we would need two changes to the "Attribute Naming" specification in OTel:
Reserve ".ref." so that it can only be used for this purpose (".ref." to be used only for information about reference-typed values and their metadata).
Exempt this usage from the rule that "Names SHOULD NOT coincide with namespaces"
With respect to the latter exemption, our understanding is that this rule exists to allow coaelescing the flat attributes into a structured object. We believe that ".ref." fits the spirit of this even while violating the literal rule as it exists now, because the ".ref." does not introduce a conceptually new attribute; rather, it replaces the existing attribute with a different representation. Thus if a backend were to implement a coalescing system to make attributes non-flat, they could combine all of the ".ref." attributes into a single "ReferenceValue" type as the value for the top-level attribute that does not contain any ".ref." value in it.
Describe alternatives you've considered
Use a compound data type
For example, we could use a "KeyValueList kvlist_value" to represent the reference not as a bunch of separate attributes but as a single complex attribute value. However, this would go contrary to the attribute specification as well as existing OTel libraries which do not provide direct access to the underlying data model in OTLP following this convention restriction.
Introduce a new data type
We could attempt to introduce into OTLP (and into other parts of the specification) a new "ReferenceValue" data type. However, this would require significant effort across multiple languages, libraries, backend systems, etc. to support and thus seems like a non-starter.
Just move the large attributes to be events/logs
This assumes that the event/logs backend can, itself, accept arbitrarily large data. This assumption may not hold true for every event/log backend system. There is an inherent tradeoff between reliabilty/latency and the amount of data that a backend can allow; to provide a very tight latency SLO that caps 99%-ile latency, it's important to guarantee low variance in the size of requests which, in turn, may limit the amount of data that an event/log backend can accept (while being able to provide such a tight time bound). When using such a system for indexing the metadata about the events and making them quickly available for searching, it would still be useful to have a client-side mechanism to route larger content that is not critical for indexing to a blob storage system more suitable for large object storage and to be able to make it easy to interlink and navigate between backend systems by referencing the storage location.
Replace/upload just the entire body of events/logs
While this might serve the purpose of "gen_ai.prompt" / "gen_ai.completion" if/when it moves from a span event to a more general event, it would still be necessary to provide a data model for representing that the event body had been uploaded/replaced with a reference. In addition, this would not cover many other cases (e.g. "http.request.body.content"), and -- beyond this -- it is desirable to upload a much more targeted/limited portion of the data in order to allow indexing / searching the other content that does fit within backend subsystem limits.
Additional context
Examples
Example 1:
http.response.body.content
[Span Attribute]Before
After
Example 2:
gen_ai.prompt
/gen_ai.completion
[Span Event Attribute]Before
After
The text was updated successfully, but these errors were encountered: