Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283

jfdenise · 2024-12-16T16:00:52Z

ISSUES:

bstansberry · 2024-12-16T16:39:51Z

@jamezp Please review as you're kind of an SME on this.

@jfdenise @yersan @jamezp I put the 27.x label on this mostly to get your attention so you can think whether this needs to be in WF 35 or not. I suspect the only urgency around this is the intermittent failure WFCORE-7097 mentions, and then the bootable jar failures we are seeing in full WF in ts/int/elytron-oidc-client. But those don't force us to do something quickly if we don't think that's the right thing to do; those both may have workarounds.

bootable-jar/runtime/src/main/java/org/wildfly/core/jar/runtime/InstallationCleaner.java

jamezp · 2024-12-16T18:06:30Z

bootable-jar/boot/src/main/java/org/wildfly/core/jar/boot/CleanupProcessor.java

+        if (Files.notExists(cleanupMarker)) {
+            return;
+        }


I'm not sure I follow this. Can you explain why we were seeing an issue if the file does not exist here?

In case, the process is started but the cleanup already occurred (a timeout + a previous process running).

jamezp · 2024-12-16T18:11:13Z

bootable-jar/runtime/src/main/java/org/wildfly/core/jar/runtime/InstallationCleaner.java

+        }
+        // Do a last cleanup, in case the cleanupMarker still exists (could have been deleted by running process).
+        if (Files.exists(cleanupMarker)) {
+            cleanup();


This could potentially launch another process. We should probably just invoke the deleteDirectory() at this point.

Yes, that is done in purpose. On Windows we need the the external process. The cleanup waits until the process terminate (with a timeout).

Is the idea that if the process is running while the other the bootable JAR process is still running that we terminate that process and start a new one? I'm just a little confused on what we gain here.

That is in case, the previous process didn't complete the deletion for some reason (timeout and process forcibly terminated from the caller thread), we start a new process to finalize.

wildfly-ci · 2024-12-16T19:49:41Z

Core -> Full Integration Build 14432 outcome was UNKNOWN using a merge of a7eede0
Summary: Canceled (Error while applying patch; cannot find commit 12f2330 in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:40

wildfly-ci · 2024-12-16T19:49:57Z

Core -> Full Integration Build 14131 outcome was UNKNOWN using a merge of a7eede0
Summary: Canceled (Error while applying patch; cannot find commit 12f2330 in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:17

wildfly-ci · 2024-12-16T19:50:30Z

Core -> WildFly Preview Integration Build 14213 outcome was UNKNOWN using a merge of a7eede0
Summary: Canceled (Error while applying patch; cannot find commit 12f2330 in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:16

wildfly-ci · 2024-12-17T12:16:02Z

Core -> Full Integration Build 14437 outcome was UNKNOWN using a merge of 5e6e784
Summary: Canceled (Error while applying patch; cannot find commit 36fce9b in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:16

wildfly-ci · 2024-12-17T12:16:18Z

Core -> Full Integration Build 14136 outcome was UNKNOWN using a merge of 5e6e784
Summary: Canceled (Error while applying patch; cannot find commit 36fce9b in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:22

wildfly-ci · 2024-12-17T16:33:16Z

Core -> Full Integration Build 14137 outcome was FAILURE using a merge of 5e6e784
Summary: Tests failed: 1 (1 new), passed: 4407, ignored: 55 Build time: 03:37:39

Failed tests

org.jboss.as.test.clustering.cluster.ejb.stateful.StatefulTimeoutTestCase.timeout: java.lang.AssertionError: expected:<4> but was:<0>
	at org.jboss.as.test.clustering.cluster.ejb.stateful.StatefulTimeoutTestCase.timeout(StatefulTimeoutTestCase.java:88)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
------- Stdout: -------
 [0m14:34:15,207 INFO  [org.jboss.modules] (main) JBoss Modules version 2.1.6.Final
 [0m [0m14:34:16,223 INFO  [org.jboss.msc] (main) JBoss MSC version 1.5.5.Final
 [0m [0m14:34:16,239 INFO  [org.jboss.threads] (main) JBoss Threads version 2.4.0.Final
 [0m [0m14:34:16,416 INFO  [org.jboss.as] (MSC service thread 1-3) WFLYSRV0049: WildFly 35.0.0.Final-SNAPSHOT (WildFly Core 27.0.0.Final-SNAPSHOT) starting
 [0m [0m14:34:18,321 INFO  [org.wildfly.security] (Controller Boot Thread) ELY00001: WildFly Elytron version 2.6.0.Final
 [0m [0m14:34:19,723 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0039: Creating http management service using socket-binding (management-http)
 [0m [0m14:34:19,767 INFO  [org.xnio] (MSC service thread 1-2) XNIO version 3.8.16.Final
 [0m [0m14:34:19,793 INFO  [org.xnio.nio] (MSC service thread 1-2) XNIO NIO Implementation Version 3.8.16.Final
 [0m [0m14:34:19,844 INFO  [org.jboss.as.connector.subsystems.datasources] (ServerService Thread Pool -- 32) WFLYJCA0004: Deploying JDBC-compliant driver class org.h2.Driver (version 2.2)
 [0m [0m14:34:19,968 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 40) WFLYCLINF0001: Activating Infinispan subsystem.
 [0m [0m14:34:19,991 INFO  [org.jboss.remoting] (MSC service thread 1-4) JBoss Remoting version 5.0.30.Final
 [0m [0m14:34:20,011 INFO  [org.wildfly.extension.io] (ServerService Thread Pool -- 41) WFLYIO001: Worker 'default' has auto-configured to 8 IO threads with 64 max task threads based on your 4 available processors
 [0m [0m14:34:20,081 INFO  [org.jboss.as.jaxrs] (ServerService Thread Pool -- 42) WFLYRS0016: RESTEasy version 6.2.11.Final
 [0m [0m14:34:20,101 INFO  [org.jboss.as.clustering.jgroups] (ServerService Thread Pool -- 44) WFLYCLJG0001: Activating JGroups subsystem. JGroups version 5.3.13
 [0m [0m14:34:20,113 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-6) WFLYJCA0018: Started Driver service with driver-name = h2
 [0m [0m14:34:20,125 INFO  [org.jboss.as.connector] (MSC service thread 1-4) WFLYJCA0009: Starting Jakarta Connectors Subsystem (WildFly/IronJacamar 3.0.10.Final)
 [0m [0m14:34:20,134 INFO  [org.jboss.as.naming] (ServerService Thread Pool -- 48) WFLYNAM0001: Activating Naming Subsystem
 [0m [33m14:34:20,217 WARN  [org.wildfly.extension.elytron] (MSC service thread 1-2) WFLYELY00023: KeyStore file '/opt/buildAgent/work/e8e0dd9c7c4ba60/full/testsuite/integration/clustering/target/wildfly-clustering-ejb-1/standalone/configuration/application.keystore' does not exist. Used blank.
 [0m [33m14:34:20,279 WARN  [org.jboss.as.txn] (ServerService Thread Pool -- 53) WFLYTX0013: The node-identifier attribute on the /subsystem=transactions is set to the default value. This is a danger for environments running multiple servers. Please make sure the attribute value is unique.
 [0m [0m14:34:20,295 INFO  [org.jboss.as.ejb3] (MSC service thread 1-5) WFLYEJB0482: Strict pool mdb-strict-max-pool is using a max instance size of 16 (per class), which is derived from the number of CPUs on this host.
 [0m [0m14:34:20,298 INFO  [org.jboss.as.ejb3] (MSC service thread 1-7) WFLYEJB0481: Strict pool slsb-strict-max-pool is using a max instance size of 16 (per class), which is derived from thread worker pool sizing.
 [0m [33m14:34:20,323 WARN  [org.wildfly.extension.elytron] (MSC service thread 1-5) WFLYELY01084: KeyStore /opt/buildAgent/work/e8e0dd9c7c4ba60/full/testsuite/integration/clustering/target/wildfly-clustering-ejb-1/standalone/configuration/application.keystore not found, it will be auto-generated on first use with a self-signed certificate for host localhost
 [0m [0m14:34:20,426 INFO  [org.jboss.as.naming] (MSC service thread 1-5) WFLYNAM0003: Starting Naming Service
 [0m [0m14:34:20,503 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-3) WFLYUT0003: Undertow 2.3.18.Final starting
 [0m [33m14:34:20,760 WARN  [org.jboss.as.domain.http.api.undertow] (MSC service thread 1-4) WFLYDMHTTP0003: Unable to load console module for slot main, disabling console
 [0m [0m14:34:20,793 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-6) WFLYUT0012: Started server default-server.
 [0m [0m14:34:20,802 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-6) Queuing requests.
 [0m [0m14:34:20,803 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-6) WFLYUT0018: Host default-host starting
 [0m [0m14:34:20,857 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-1) WFLYUT0006: Undertow HTTP listener default listening on [::1]:8080
 [0mnode-1 2024-12-17 14:34:21,005 INFO  [org.jboss.as.server.deployment.scanner] (MSC service thread 1-8) WFLYDS0013: Started FileSystemDeploymentService for directory /opt/buildAgent/work/e8e0dd9c7c4ba60/full/testsuite/integration/clustering/target/wildfly-clustering-ejb-1/standalone/deployments
node-1 2024-12-17 14:34:21,108 INFO  [org.jboss.as.ejb3] (MSC service thread 1-2) WFLYEJB0493: Jakarta Enterprise Beans subsystem suspension complete
node-1 2024-12-17 14:34:21,300 INFO  [org.jboss.as.connector.subsystems.datasources] (MSC service thread 1-4) WFLYJCA0001: Bound data source [java:jboss/datasources/ExampleDS]
node-1 2024-12-17 14:34:21,601 INFO  [org.jboss.as.server] (Controller Boot Thread) WFLYSRV0212: Resuming server
node-1 2024-12-17 14:34:21,611 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0060: Http management interface listening on http://[::1]:9990/management
node-1 2024-12-17 14:34:21,611 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0054: Admin console is not enabled
node-1 2024-12-17 14:34:21,612 INFO  [org.jboss.as] (Controller Boot Thread) WFLYSRV0025: WildFly 35.0.0.Final-SNAPSHOT (WildFly Core 27.0.0.Final-SNAPSHOT) started in 7345ms - Started 286 of 660 services (449 services are lazy, passive or on-demand) - Server configuration file in use: standalone-full-ha.xml - Minimum feature stability level: community

wildfly-ci · 2024-12-17T18:31:46Z

Core -> WildFly Preview Integration Build 14220 outcome was UNKNOWN using a merge of 5e6e784
Summary: Canceled (Error while applying patch; cannot find commit 36fce9b in the https://github.com/wildfly/wildfly-core.git repository, possible reason: refs/pull/6283/merge branch was updated and the commit selected for the ... Build time: 00:00:39

yersan · 2024-12-18T11:03:48Z

bootable-jar/runtime/src/main/java/org/wildfly/core/jar/runtime/InstallationCleaner.java

@@ -45,28 +46,13 @@ class InstallationCleaner implements Runnable {
    }

    @Override
-    public void run() {
+    public synchronized void run() {


I am missing something; this task is submitted by a SingleThreadExecutor in the BootableJar shutdown hook.
If we are marking it as synchronized, it only means that we could have more than one Bootable Jar instance for the same server home, right?
If that's true, the same marked file is also shared as a marker across the multiple Bootable JAR instances launched from the same home, wouldn't that be an issue after all?

this task is submitted by a SingleThreadExecutor in the BootableJar shutdown hook.
If we are marking it as synchronized, it only means that we could have more than one Bootable Jar instance for the same server home, right?

Well, even in that case, we are creating new instances of this InstallationCleaner on each shuftdown hook, so I don't get why the synchronized is required (or nice to have ) at this point.

@yersan , we could have a timeout on the calling thread. The task (running in its own thread) is not yet done, and the calling thread will attempt to do a cleanup again. We need to synchronize at this point to avoid multiple cleanup in //. synchronize enforces it.

@jfdenise ok, so it is not to allow dealing with muiltiple Bootable JARs from the same server home.

Ok, in that case, should not be InstallationCleaner.cleanup() method be the one that should be synchronized?

That's the method in common with the InstallationCleaner.run() and InstallationCleaner.cleanupTimeout() which are the entry points for the submitted task and the explicit cleaner.cleanupTimeout()

@yersan The key piece is: Files.notExists(cleanupMarker)
We need to have all threads to share a common view on it. So all entry points to it should be synchronized.

jfdenise · 2024-12-18T12:24:12Z

FYI, I am constantly re-running the 2 bootable JAR jobs, my goal is to run 10 times on the 2 platforms with no issues.

jfdenise · 2024-12-18T15:26:39Z

@yersan , 10 green runs on each platform. I will stop testing.

yersan · 2024-12-18T16:18:37Z

@jamezp Can you review again? Thanks!

jamezp

I'm approving, but I think we might need to revisit this. I don't want to block fixing CI though.

I don't think this necessarily wrong, I think we're just overly using resources. We have a shutdown hook which attempts to cleanup the resources. On Windows these typically can't be cleaned up until the server process is ended. However, we attempt to wait for the process to end, then we launch another process as a final clean up. I think we could stream line this a bit, but I'd need to think it through a little more.

yersan · 2024-12-18T19:00:45Z

bootable-jar/runtime/src/main/java/org/wildfly/core/jar/runtime/InstallationCleaner.java

@@ -153,7 +159,9 @@ private void newProcess() throws IOException {
                .redirectError(ProcessBuilder.Redirect.INHERIT)
                .redirectOutput(ProcessBuilder.Redirect.INHERIT)
                .directory(new File(System.getProperty("user.dir")));
-        builder.start();
+        process = builder.start();
+        process.waitFor(environment.getTimeout(), TimeUnit.SECONDS);


This also sounds a bit inappropriate, since this method could be invoked directly from the shutdown hook thread. The shutdown hook API says that is inadvisable to attempt any user interaction or to perform a long-running computation in a shutdown hook.

I guess my question would be if this is somehow killed completely by the JVM, if there could be possibilities of having the started process running around.

In any case, if decided, we can move on and see how it behaves in the CI

@yersan , I was thinking to it more, and I think that we shouldn't merge it. Although I am confident on the Linux fix, it requires more on Windows front.

…ssionsDeploymentTestCase.testWithConfiguredMaxBootThreads

… module when ts.bootable

jfdenise · 2024-12-20T11:50:27Z

@jamezp , When testing with a lot of corner cases (with complex scheduling scenarii) on Windows, I came to the conclusion that we need, in the forked process, to wait for the server process to terminate prior to delete the installation. That is the only way to ensure that nothing is left behind and the installation is actually deleted.

jamezp · 2024-12-20T15:23:42Z

@jamezp , When testing with a lot of corner cases (with complex scheduling scenarii) on Windows, I came to the conclusion that we need, in the forked process, to wait for the server process to terminate prior to delete the installation. That is the only way to ensure that nothing is left behind and the installation is actually deleted.

@jfdenise Yes. That is what it's supposed to be doing currently. I guess I should read the Jira's to see what problem we're trying to solve.

One thing I'm not sure why I'd originally done, is create an Executor in the shutdown hook and launch the deletion in a new thread. That seems odd to me. However, the new process is correct. It's the only way on Windows that file locks will be removed.

jamezp · 2024-12-20T15:54:15Z

Looking at the PermissionsDeploymentTestCase test, my guess is the testWithConfiguredMaxBootThreads is failing because it runs last on Windows. What is likely happening is on the ServerController.stop(), the new process is launched and the bootable JAR is being deleted when the new test starts. The new process is deleting files the second test is starting to extract.

The ServerController has some specific deleting of files for the bootable JAR. I think this is likely a timing issue.

bstansberry added the 27.x label Dec 16, 2024

jamezp requested changes Dec 16, 2024

View reviewed changes

jfdenise force-pushed the WFCORE-7097 branch from a7eede0 to 5e6e784 Compare December 17, 2024 11:17

This comment was marked as off-topic.

Sign in to view

yersan reviewed Dec 18, 2024

View reviewed changes

yersan requested a review from jamezp December 18, 2024 16:12

jamezp approved these changes Dec 18, 2024

View reviewed changes

yersan reviewed Dec 18, 2024

View reviewed changes

jfdenise marked this pull request as draft December 19, 2024 09:59

yersan added the hold Do not merge this PR label Dec 19, 2024

jfdenise force-pushed the WFCORE-7097 branch from 5e6e784 to b4410ea Compare December 20, 2024 11:13

jfdenise added 2 commits December 20, 2024 12:18

Fix for WFCORE-7097, Intermittent bootable jar test failures of Permi…

a5eaa0f

…ssionsDeploymentTestCase.testWithConfiguredMaxBootThreads

Fix for WFCORE-7098, InstallationManagerBootTestCase shouldn't delete…

913f86c

… module when ts.bootable

jfdenise force-pushed the WFCORE-7097 branch from b4410ea to 913f86c Compare December 20, 2024 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283

Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283

jfdenise commented Dec 16, 2024

bstansberry commented Dec 16, 2024

jamezp Dec 16, 2024

jfdenise Dec 17, 2024

jamezp Dec 16, 2024

jfdenise Dec 17, 2024

jamezp Dec 17, 2024

jfdenise Dec 18, 2024

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 17, 2024

wildfly-ci commented Dec 17, 2024

wildfly-ci commented Dec 17, 2024

This comment was marked as off-topic.

wildfly-ci commented Dec 17, 2024

This comment was marked as off-topic.

yersan Dec 18, 2024 •

edited

Loading

yersan Dec 18, 2024

jfdenise Dec 18, 2024

yersan Dec 18, 2024

jfdenise Dec 18, 2024

jfdenise commented Dec 18, 2024

jfdenise commented Dec 18, 2024

yersan commented Dec 18, 2024

jamezp left a comment

yersan Dec 18, 2024

jfdenise Dec 19, 2024

jfdenise commented Dec 20, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

jamezp commented Dec 20, 2024

jamezp commented Dec 20, 2024

Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283

Are you sure you want to change the base?

Fix for WFCORE-7097 and Fix for WFCORE-7098 #6283

Conversation

jfdenise commented Dec 16, 2024

bstansberry commented Dec 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 16, 2024

wildfly-ci commented Dec 17, 2024

wildfly-ci commented Dec 17, 2024

wildfly-ci commented Dec 17, 2024

Failed tests

This comment was marked as off-topic.

wildfly-ci commented Dec 17, 2024

This comment was marked as off-topic.

yersan Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfdenise commented Dec 18, 2024

jfdenise commented Dec 18, 2024

yersan commented Dec 18, 2024

jamezp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfdenise commented Dec 20, 2024

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

jamezp commented Dec 20, 2024

jamezp commented Dec 20, 2024

yersan Dec 18, 2024 •

edited

Loading