Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API safety and soundness #29

Open
vi opened this issue Jul 10, 2023 · 3 comments
Open

API safety and soundness #29

vi opened this issue Jul 10, 2023 · 3 comments

Comments

@vi
Copy link

vi commented Jul 10, 2023

Description mentions that the library uses mmap, also links to "Single-level store" article, which suggests that jammdb may give users references leading to mmaped files.

I expected e.g. jammdb::DB::open to be unsafe fn, as full protection against unreasonable things that is needed to make it completely sound (or maybe even enough-for-practical-considerations sound) may not be viable gived the architecture.

But it seems to be not the case, the crate API does not hint at potentials undefined behaviours users may face with the library. For example, what worst could happen if the database file is mangled by external process while in use? What if multiple processes open the same database for writing (including using networked filesystem without locking)?

Shall the entry function be marked unsafe to make users commit to not doing unreasonable things to the database? Or is jammdb actually fully handles all possible complications of memory mapping, so the API is indeed sound even when misused?

@pjtatlow
Copy link
Owner

Hey @vi! Thanks for bringing this up. I don't know that it's realistic to try and handle all possible complications involving memory mapping, and it's definitely possible that by manipulating the underlying file you can mess up your database and cause a panic or other undefined behavior. However, I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe. unsafe is no "protection" as you've put it, so I don't see the point of marking the function that way. In my opinion it would just cause uncertainty in the library's stability rather than warning users to not do something dumb. I'm happy to debate this point though!

Perhaps some more documentation would be appropriate though? Some big warning that says not to mess with the file while it's open by another process?

@vi
Copy link
Author

vi commented Jul 11, 2023

unsafe is no "protection" as you've put it, so I don't see the point of marking the function that way.

unsafe is not a protection, but moving the responsibility from authors of the crate to its users.

I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe.

For example, would it be wise to use the app from privileged suid executable that accesses a database which is created by unprivileged users? Will straightforward usage of the API lead to a secure application? Users of the crate may think "if database is adversarial then it may return fake data, so we need e.g. to verify signatures of those data before using it" while in reality mere opening the database leads to code execution.

Perhaps some more documentation would be appropriate though?
Some big warning that says not to mess with the file while it's open by another process?

Documentation should be a starting point. The documentation should mention what worst can happen if the rules are not being followed. If this "worst" involves undefined behaviour, then it invites unsafe fn.

If, however, the function is a gateway to something large and potential unsoundness only reachable in really tricky scenarios, then the function is often not marked unsafe. For example, accessing debugging tools like /proc/self/mem from filesystem API or /usr/bin/gdb from process spawn API, executing Python code (which may involve unsafe ctypes module) are not typically marked unsafe.

But, for example, opening attacker-provided or corrupted database file should not be considered as too exotic scenario and should invite unsafe, unless it is guaranteed that problems would be encapsulated only to the database and data returned by it (i.e. OK ways to misbehave: infinite loop, deadlock, process abort, large memory allocation, returning arbitrary, but well-formed data to users, infinite recursion. Not OK ways to misbehave: misaligned/dangling references, returning non-UTF-8 byte blocks as strings to users; mangling other, neighbouring, unrelated and uncorrupted jammdb::DB instance).

Precedent: BTreeMap/HashMap in std and bad Ord/Hash implementations:

It is a logic error for a key to be modified ... The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.

This encapsulation allows BTreeMap to stay sound even when misbehaving. Can the same be stated for jammdb::DB?


(locking the file if possible)
if possible

There may be two functions for opening the database: usual pub fn open which enforces proper mandatory locking, overridable only with root access or debuggers and failing to open the database if it is not possible; and pub unsafe fn open_unchecked which skips locking or makes it optional.

@tokahuke
Copy link

tokahuke commented Dec 30, 2024

enforces proper mandatory locking

Well, mandatory locking, at least in Linux, is a lost cause, so...

In general, this is a problem involving all applications using mmap. So, "some big warning that says not to mess with the file" might be the best we have in this scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants