-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Developper documentation relation postgres <--> Ceph #2658
Comments
@thoth-station/devs I appreciate if anyone has some insight or pointers regarding this. I see a number of |
/cc @harshad16 |
Lets gather what kinda details would help everyone. Ceph is document storage, we tend to keep a lot of documents stored in our ceph services based on various components. for example, if we run any execution of solver, we tend to keep its results in ceph services and sync the result in Postgres db with help of storages module. List of these kinda result can be reference from here: https://github.com/thoth-station/storages/blob/master/thoth/storages/__init__.py#L20-L42 In the case of the python package index: we won't be having a specific document for each index register in ceph at least. Hoping we understood what kinda result gets into ceph, based on above comment. As i mentioned these docs in ceph services are kept to sync them to postgres db if ever needed in future. As we won't have documents directly depending on the index.
we could find that solver result doc keeps track of index. Additional answers:
Ceph service is an open source storage service, which uses a similar S3 API Call, so in our documentation sometimes we reference s3 call or s3 store. However, Ceph is a service deployed and being used, as it calls are also s3 it would show up in various places. |
Let's see if I can summarize what I understand, to see if I really **do**
understand. I'll see if I can work on a PR to add that in docs after
that.
We store documents in Ceph. Those are the results of various kind of
operations.
Those documents are original data (They can't be reconstructed from the db
postgres).
Postgres references those documents.
Follow ups questions:
- I'm not sure to understand what's going on in the sync process. If
something exists in Ceph, it will be referenced/created in postgres ?
- Can we map one document in ceph to one entity only in postgres ? In
other words, do we have a many-to-one relation between ceph documents
and postgres entries ? (or another relation, or does that depends ?)
- Do the documents in ceph references back to postrgres entries ?
- Which one of them is the single source of truth, postgres or ceph ? Or
do they each participate in it ?
On Fri, Jul 15, 2022 at 01:12:07AM -0700, Harshad Reddy Nalla wrote:
Additional answers:
> Also, is it only used as an S3 store ? In that case, why reference Ceph particularly ?
Ceph service is an open source storage service, which uses a similar S3 API Call, so in our documentation sometimes we reference s3 call or s3 store. However, Ceph is a service deployed and being used, as it calls are also s3 it would show up in various places.
Yeah, I see what Ceph is. My questions is more, do we use it exclusively
through the S3 API, or do we also use other features, like CephFS or
block storage ?
In the first case, we might drop references to Ceph in docs and in the
code and simply works with an S3 API, which could be backed by any
service providing that S3 API (the fact that it's backed by Ceph would
be an operational detail).
|
@mayaCostantini Any thoughts ? |
Yes original result data, cant be reconstructed from db postgres, only reconstruction via re-running the operations again.
Not true for all sync, some of them are designed in that way for example:
The map would be more of many-to-many,
ceph doesnt reference back to postgres, its other way around, The connotation of both is different, so saying one of it is signle source of truth would be right.
we use it through s3 api or package with support s3 , don't know what these packages have underlying in their architecture. |
Ok, I think I have an relatively good overview, I'll get started 👍
|
@VannTen I think Harshad provided a great explanation, I don't see any more details to add that could be useful. Thanks @harshad16 ! |
related: thoth-station/thoth-application#2539 |
is this something we can extract/summarize out in to the docs? |
I'm not sure.
I think there is two public for the information, Thoth devs and Thoth
ops, and it's not exactly the same information (checkout #2661)
Developer docs should stay in this repo I think, but I could see
operator docs regarding the storage models being centralized with the
rest of the operational documentation (which is the point of the
thoth-application issue if I read it correctly).
|
/remove-kind bug Related (closely) : #2691 |
@VannTen: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority important-longterm |
/sig stack-guidance |
It's not very clear from the code or the documentation what Ceph is used for:
I can see here that we store task results,
here
how to access it, but there should a high level description of it's role,
comparable to the postgres schema that we can generate with
generate-schema
.Currently it's a bit hard to get a definite idea of what is inside (and
consequently, what should be deleted, see #2657)
Also, is it only used as an S3 store ? In that case, why reference Ceph
particularly ?
/kind documentation
The text was updated successfully, but these errors were encountered: