[POC] exporter batcher - byte size based batching #12017
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is an POC of serialized size based batching.
Configuration is supported via an additional field to
MaxSizeConfig
.We will validate that at most one of the above fields are specified (TODO) and switch between item count-based batching vs. byte size-based batching accordingly.
To get the byte size of otlp protos, this PR updates
pdata/internal/cmd/pdatagen/internal/templates/message.go.tmpl
to expose an interfaceSize()
. This change will apply to allpdatagen
-generated files.Performance
The above benchmark is tests two cases:
Case 1: merge split 1000 logs, where each incoming log involves one merge and one split. Byte based batching takes 70% more time in this case.
Case 2: merge split a log that splits into 100 logs. Byte based batching takes 500% more time in this case.
CPU Pprof shows that the majority of time is spent on calculating the byte size.
I tried reducing the number of byte-size calculation by caching byte size result in integers, but that did not help improve the performance (seems compiler or proto library is smart enough to reuse previously calculated result).
Optimization
Link to tracking issue
Fixes #
Testing
Documentation