-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: refactor #14
*: refactor #14
Conversation
Here is an example of the new logging messages:
This is also an improvement: sometimes we'd get errors and had no idea what query they were coming from. |
c6f40d8
to
aade067
Compare
Friendly ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old version of parca-load tried to figure out data and then start querying randomized data from a Parca instance.
This version of parca-load implements a different approach, but it definitely suits us better for now. 👍
Left a bunch of comments in the review.
Great work!
querier.go
Outdated
log.Println("values: no labels to query") | ||
return | ||
} | ||
label := q.labels[len(q.labels)/2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always query values for the same label name if the labels are stable. We should have some randomness added here, such that every run we query slightly different data in the system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'm going to add in an rng. The only problem is that we could report metrics for different queries. In this case it doesn't matter but what about merging cpu vs memory profiles? Not sure if that should belong on the same time series. Maybe we should do a query per value and have those be different labels in time series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fair point. I think just randomness has served us well up until now, but we can always change that.
querier.go
Outdated
return | ||
} | ||
|
||
profileType := q.profileTypes[len(q.profileTypes)/2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. We should make this configurable or randomized.
} | ||
} | ||
|
||
func (q *Querier) querySingle(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need querySingle anymore. We have it available in Parca and Polar Signals Cloud, however, the UI is never using the API anymore. We should just remove it from parca-load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the UI display a single profile or get a profile to diff? I think this is still a very important query to have, as it highlighted deficiencies in object storage filtering recently and I think we want to add it to our OKRs next quarter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using this old API the UI now does a merge request where the start and end timestamps are the same. That way we only need one API in the future but still can request individual profiles for a timestamp. This old querySingle
was mostly a shim for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 I'll open a separate issue for this too
querier.go
Outdated
return | ||
} | ||
|
||
profileType := q.profileTypes[len(q.profileTypes)/2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the profileType should use the same profileType as the previous QueryRange. Then we can depend on the start- and end-timestamps to figure out what to merge profiles to query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something that I would like to discuss with you. I think that we don't need to rely on timestamps returned by QueryRange
.
We issue, for example, 15 minute query ranges on data that does not span the whole 15 minutes (e.g. say we've only collected 5 minutes of data). This query is considered successful and its latency is reported through the metrics. Why not treat merges the same way? It reduces complexity. And yes, the latency might be a lot better when we have less data but I think that's fine. It should be easily attributed to data size.
Additionally, I think it's better to issue a merge query that might not have the full time range than silently not run merge queries. If you look at pyrra for our cloud project for example, there is no consistency in how often merge queries are run.
This commit refactors all querying logic into a new Querier struct. Since all queries are more or less dependent on each other, the execution logic is simplified to run the queries one at a time, with the option to add concurrent Queriers in the future if we want to increase load. This also exposes a single "query-interval" flag to make it easier for the user to choose a single interval in which to execute queries. Other bits and bobs were also improved: - Termination has been simplified - Expiring maps have been removed. Although it was a good approach to store state needed by future queries, it is simpler and more memory efficient to just store the last result for each query. Since queries are now run in the same goroutine one after another, there is also no need for synchronization. - Queries are no longer run conditionally. For example, we would avoid running 15 minute merges if QueryRange didn't return a series with enough data. Anecdotally, this seemed broken and it was implemented in a leaky way since we would always run a QueryRange and store the latency as valid even if the data QueryRange executed on was less than the expected range. This simplifies querying logic.
aade067
to
6a35df1
Compare
Updated with the discussed changes |
This commit refactors all querying logic into a new Querier struct. Since all queries are more or less dependent on each other, the execution logic is simplified to run the queries one at a time, with the option to add concurrent Queriers in the future if we want to increase load. This also exposes a single "query-interval" flag to make it easier for the user to choose a single interval in which to execute queries.
Other bits and bobs were also improved: