Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build, run: record hash or digest in image history for sources used in --mount #5691

Merged
merged 1 commit into from
Jan 25, 2025

Conversation

flouthoc
Copy link
Collaborator

When using --mount=type=bind or --mount=type=cache the hash or digest of source in these flags should be added to image history so buildah can burst cache if files on host or image which is being used as source is changed.

Closes

What type of PR is this?

/kind api-change
/kind bug
/kind cleanup
/kind deprecation
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake
/kind other

What this PR does / why we need it:

How to verify it

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

build, run: record hash or digest in image history for sources used in `--mount`

@flouthoc flouthoc marked this pull request as draft August 18, 2024 17:01
@flouthoc
Copy link
Collaborator Author

Need to add tests, will undraft then.

@flouthoc
Copy link
Collaborator Author

Will rebase after this #5693

@nalind
Copy link
Member

nalind commented Sep 17, 2024

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory) to account for different permissions/ownership/datestamps/xattrs and to safely handle soft and hard links.

@flouthoc
Copy link
Collaborator Author

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history ? And on everyrun I can do the same process again to check if hash is matching with what is written in the history ? The idea sounds good to me.

@nalind
Copy link
Member

nalind commented Sep 17, 2024

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history?

I wouldn't expect the archive to be written anywhere, but the digest of an archive is something we already use as a way of describing contents, when handling COPY and ADD instructions. I don't know yet about doing this over the entire build context or additional build context if only a portion of it is being used at that point (i.e., if "src" is set to a subdirectory).

@flouthoc
Copy link
Collaborator Author

I'm surprised that we'd care about the contents of caches. I'd be inclined to archive the contents of a directory (and create a single-entry archive for a non-directory)

If i'm understanding this correctly, did you mean instead of computeDirectoryHash or computeFileHash separately and recursively I should just create a temporary archive of the context dir and write its hash in history?

I wouldn't expect the archive to be written anywhere, but the digest of an archive is something we already use as a way of describing contents, when handling COPY and ADD instructions. I don't know yet about doing this over the entire build context or additional build context if only a portion of it is being used at that point (i.e., if "src" is set to a subdirectory).

This sounds good to me, i will amend the PR.

@sanmai-NL
Copy link

@flouthoc

I'm seeing this behavior with podman build, will this be fixed by this change too?

ARG PATH_1=mydirectory
ARG SELINUXRELABEL=,z
ARG DISTRO=PATH_1
RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?} \
  echo Nop

Subsequently, /tmp/mydirectory ends up in an image layer rather than being ephemeral and only mounted at build time. This bloats the image and leaks information.

@flouthoc
Copy link
Collaborator Author

flouthoc commented Oct 3, 2024

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

@sanmai-NL
Copy link

sanmai-NL commented Oct 4, 2024

Thanks for your response. Yes indeed, they do end up in there, which I find surprising. And which is the reason for my comment.

Some factors which may cause this in case the problem isn't general
This happens when running podman build from a containerized Podman. Maybe ID mapping also causes some sort of copy operation instead of a bind mount.

@sanmai-NL
Copy link

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

Re-reading your comment... So you do find this ending up in a layer by-design, do you? But this way, information leaks and the image bloats.

@flouthoc
Copy link
Collaborator Author

RUN --mount=type=bind,source=${PATH_1:?},target=/tmp/${PATH_1:?}${SELINUXRELABEL:?}

@sanmai-NL Are contents of PATH_1 ending up in your final built image as well ? A layer is created so that i can be used later but I don't think contents of RUN --mount are part of your image.

Re-reading your comment... So you do find this ending up in a layer by-design, do you? But this way, information leaks and the image bloats.

@sanmai-NL What you are describing is a different bug, would it be possible to create a small reproducer and open a new issue ?

Copy link

A friendly reminder that this PR had no activity for 30 days.

@flouthoc
Copy link
Collaborator Author

Since #5693 is merged, i'll get back to this.

@openshift-ci openshift-ci bot removed the approved label Jan 14, 2025
@flouthoc flouthoc marked this pull request as ready for review January 14, 2025 21:21
@flouthoc
Copy link
Collaborator Author

@nalind @containers/buildah-maintainers PTAL

@flouthoc flouthoc removed the stale-pr label Jan 14, 2025
@flouthoc flouthoc requested a review from nalind January 14, 2025 22:06
internal/parse/parse.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
@flouthoc
Copy link
Collaborator Author

@nalind Could you PTAL again.

Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be very sure you want to use sha256 over an entire directory tree. It can be an expensive computation if the tree is large.

if imgID, ref, err = s.commit(ctx, s.getCreatedBy(nil, ""), emptyLayer, s.output, s.executor.squash || s.executor.confidentialWorkload.Convert, lastStage); err != nil {
createdBy, err := s.getCreatedBy(nil, "")
if err != nil {
return "", nil, false, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get in the habit of adding context to errors being passed up from calls to methods that returned them, to make it easier to follow the call stack when troubleshooting and debugging. Not just here, but throughout this PR.

imagebuildah/stage_executor.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
imagebuildah/stage_executor.go Outdated Show resolved Hide resolved
internal/util/util.go Outdated Show resolved Hide resolved
internal/parse/parse.go Outdated Show resolved Hide resolved
return "", err
}

tarWriter.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to just above the if err != nil check, if its return value isn't going to be checked.

@sanmai-NL
Copy link

sanmai-NL commented Jan 16, 2025

@nalind
Copy link
Member

nalind commented Jan 16, 2025

In its current form, this considers caches where a "src" option is used to select a subdirectory, but not otherwise. It should either check the mount type and skip caches, or make arrangements to handle those cases.

@flouthoc flouthoc force-pushed the burst-cache-mount branch 3 times, most recently from 7718e85 to e88f782 Compare January 18, 2025 18:31
@flouthoc flouthoc requested a review from nalind January 18, 2025 22:14
Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure of whether or not we should be considering the contents of "cache" mounts here.

imagebuildah/util.go Show resolved Hide resolved
imagebuildah/util.go Outdated Show resolved Hide resolved
imagebuildah/util_test.go Outdated Show resolved Hide resolved
tests/bud.bats Outdated
Helloworld2
_EOF

# on third run since we have added new file `anotherfile` so cache must burst.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment.

imagebuildah/util.go Outdated Show resolved Hide resolved
imagebuildah/stage_executor.go Show resolved Hide resolved
@flouthoc flouthoc force-pushed the burst-cache-mount branch 3 times, most recently from 78cac13 to ef188af Compare January 23, 2025 18:36
Copy link
Member

@nalind nalind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better, but the arg parsing still has a potential crasher in it.

imagebuildah/stage_executor.go Show resolved Hide resolved
imagebuildah/stage_executor.go Outdated Show resolved Hide resolved
imagebuildah/stage_executor.go Outdated Show resolved Hide resolved
imagebuildah/util.go Outdated Show resolved Hide resolved
imagebuildah/util.go Outdated Show resolved Hide resolved
imagebuildah/util.go Outdated Show resolved Hide resolved
data := []byte("Hello, world!")
if _, err := tempFile.Write(data); err != nil {
t.Fatalf("Failed to write data to temp file: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker: unless the filename being randomized is important to the test, using os.WriteFile() would be shorter.

imagebuildah/stage_executor.go Outdated Show resolved Hide resolved
imagebuildah/util_test.go Outdated Show resolved Hide resolved
When using `--mount=type=bind` or `--mount=type=cache` the hash or
digest of source in these flags should be added to image history so
buildah can burst cache if files on host or image which is being used as
source is changed.

Signed-off-by: flouthoc <[email protected]>
@flouthoc flouthoc requested a review from nalind January 23, 2025 23:44
@flouthoc
Copy link
Collaborator Author

@nalind Could you PTAL again.

@nalind
Copy link
Member

nalind commented Jan 24, 2025

/lgtm

@flouthoc
Copy link
Collaborator Author

@rhatdan @containers/build-maintainers PTAL

@rhatdan
Copy link
Member

rhatdan commented Jan 25, 2025

/approve

Copy link
Contributor

openshift-ci bot commented Jan 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flouthoc, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit e08082f into containers:main Jan 25, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants