IMPORTANT: Metrics issue (Abnormal status of task) - master node automatically restarting, each worker running tasks generate abnormal
tasks
#1539
Labels
abnormal
tasks
#1539
Describe the bug
For a long time now we have noticed a problem resulting from refreshing metrics that are collected by the main master-node from the worker-node. We are currently operating on Dockerfile, on the AWS cloud. We have 1 master-node and 8 worker-nodes.
The problem is that the master-node often restarts without any problem. After longer analyses, it turned out that the problem is "metrics", which cannot be turned off in any way, because you have not implemented such a method. It would be very useful in the application.
Sometimes it is possible to "bug" them, restarting the entire infrastructure or adding one more worker-node. But this is not a permanent solution, because by bugging the metrics, the problem is solved for 1-2 days.
The problem is that because of metrics, the worker-node often loses connection with the master-node when the task is started, which is why we get the task status "abnormal", and we have to manually check whether the task has already been completed or is still running. At the moment this is very burdensome for us, as each worker has at least 4-5 tasks running.
We're running master-node and each worker-node on the
crawlab-pro:latest
image.Master-node configuration:
Worker-node configuration:
Expected behavior
Add possibility to disable/enable metrics flag, or fix this issue.
Screenshots
The text was updated successfully, but these errors were encountered: