Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bats tests - parallelize #5552

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

edsantiago
Copy link
Member

All bats tests run with custom root/runroot, so it should be
possible to parallelize them.

(As of this initial commit, tests fail on my laptop, and I expect them to fail here. I just want to get a sense for how things go.)

Signed-off-by: Ed Santiago [email protected]

None

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@edsantiago edsantiago force-pushed the bats-parallel branch 2 times, most recently from 762e878 to a195865 Compare June 10, 2024 12:30
@edsantiago edsantiago force-pushed the bats-parallel branch 2 times, most recently from 4d4e9c9 to e653a16 Compare June 27, 2024 14:00
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume once we enabled parallel runs here we can port that over for the bud tests on podman? I like to get the speed up there too.

I haven't looked deeply into the prefetch logic changes but this looks much easier than podman so that is good.

@@ -19,4 +19,4 @@ function execute() {
TESTS=${@:-.}

# Run the tests.
execute time bats --tap $TESTS
execute time bats -j 4 --tap $TESTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use $(nproc) here instead of hard coding any count, should make it much faster when run locally.

Copy link

A friendly reminder that this PR had no activity for 30 days.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 10, 2024

(A brief look after a link from containers/skopeo#2437 )

zstd found even though push was without --force-compression: My first guess is that the FROM alpine; _EOF - built image (via build caching?) matches an image created in some other concurrently running test, and that causes BlobInfoCache to have a record about the zstd-compressed layer version.

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

@mtrmac
Copy link
Contributor

mtrmac commented Oct 10, 2024

… oh, and: debug-level logs could help.

@edsantiago
Copy link
Member Author

For this purpose, it would be better to have the image’s layers be clearly independent from any other test — maybe a FROM scratch adding a file that contains the test name + timestamp.

Thank you. I thought I had checked for conflicts, but must've missed something. I'll look into this again when time allows.

@edsantiago edsantiago force-pushed the bats-parallel branch 5 times, most recently from d17cb1b to 68722ca Compare November 4, 2024 13:35
@edsantiago
Copy link
Member Author

push test still flaking, and I'm giving up for the day. It is stumping me:

not ok 50 bud: build push with --force-compression

# # buildah build ...
...
# f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263
# # buildah push [...]--compression-format gzip img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # buildah push [...] --compression-format zstd --force-compression=false img-t50-nesrbrf5 docker://localhost:35311/img-t50-nesrbrf5
...
# # skopeo inspect img-t50-nesrbrf5
# {"schemaVersion":2,"mediaType":"application/vnd.oci.image.manifest.v1+json","config":{"mediaType":"application/vnd.oci.image.config.v1+json","digest":"sha256:f0e6fc41f58b93d990cb24331c648bc84b066822d06782aec493d97bc2b7b263","size":524},"layers":[{"mediaType":"application/vnd.oci.image.layer.v1.tar+zstd","digest":"sha256:05d36d22e1ad1534254c6965a3b43cf39f4dca9d5c95551eccf40108f076da2b","size":146}],"annotations":{"org.opencontainers.image.base.digest":"","org.opencontainers.image.base.name":""}}
# #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# #|     FAIL: zstd found even though push was without --force-compression
# #| expected: !~ 'zstd'

Where is that 05d36 layer coming from?

@mtrmac
Copy link
Contributor

mtrmac commented Nov 5, 2024

I can’t see any obvious reason. I’d suggest doing the pushes with --log-level=debug, that should include traces like Trying to reuse blob

edsantiago added a commit to edsantiago/buildah that referenced this pull request Nov 19, 2024
The _prefetch helper, introduced in containers#2036, is not parallel-safe: two
or more parallel jobs fetching the same image can step on each other
and produce garbage images.

Although we still can't run buildah tests in parallel (see containers#5552),
we can at least set up the scaffolding for that to happen. This
commit reworks _prefetch() such that the image work is wrapped
inside flock. It has been working fine for months in containers#5552,
and is IMO safe for production. This can then make it much
easier to flip the parallelization switch once the final zstd
bug is squashed.

Signed-off-by: Ed Santiago <[email protected]>
Copy link
Member Author

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is about as clean as I can leave this PR. Good luck!

tests/bud.bats Outdated
@@ -165,6 +165,7 @@ _EOF
# Helper function. push our image with the given options, and run skopeo inspect
function _test_buildah_push() {
run_buildah push \
--log-level=debug \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME! Remove this. It is only present in order to debug the zstd flake.

tests/mkcw.bats Outdated
@@ -22,6 +22,7 @@ function mkcw_check_image() {

# Decrypt, mount, and take a look around.
uuid=$(cryptsetup luksUUID "$mountpoint"/disk.img)
echo "# uuid=$uuid" >&3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed either.

@flouthoc
Copy link
Collaborator

@Luap99 Do you have any hints what happend last with this PR, I would like to take over this PR.

Copy link
Contributor

openshift-ci bot commented Jan 20, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: edsantiago
Once this PR has been reviewed and has the lgtm label, please assign rhatdan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Luap99
Copy link
Member

Luap99 commented Jan 21, 2025

Look at Ed's comments and fix them up, also given he said this is flaking repush several times to ensure this is running stable before we can consider merging.

@flouthoc flouthoc force-pushed the bats-parallel branch 8 times, most recently from e20800d to 42d2266 Compare January 24, 2025 23:13
@flouthoc
Copy link
Collaborator

Looks like last few commits fix the flake caused due to parallelize, I am gonna push a few times more.

@flouthoc flouthoc force-pushed the bats-parallel branch 2 times, most recently from 058e0e4 to ebad7b2 Compare January 25, 2025 05:27
All bats tests run with custom root/runroot, so it should be
possible to parallelize them.

Signed-off-by: Ed Santiago <[email protected]>
Signed-off-by: flouthoc <[email protected]>
flouthoc added a commit to edsantiago/buildah that referenced this pull request Jan 25, 2025
Address comment here containers#5552 (comment)

Signed-off-by: flouthoc <[email protected]>
@flouthoc flouthoc marked this pull request as ready for review January 25, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants