You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training?
Since in your paper I don't see it mentioned as a limitation.
Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.
Like this
env.action_space.high[0] = env.env.ACTION_SPACE
logger.info("Loading policy...")
model = SAC.load(os.path.splitext(file_path)[0], env)
obs = env.reset()
for i in range(10):
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
As you can see, the predict method only receives the observation, not the environment, so modifying the action space is useless.
I just wanted to know if maybe you who developed this tool have any idea or clarification that you can give me, I would appreciate it. 😉
The text was updated successfully, but these errors were encountered:
During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training? Since in your paper I don't see it mentioned as a limitation.
I have not evaluated a trained model, but you can try to do it. However, it would be best if you waited more than one-hour testing so that the learned policy can learn the correct action according to the observed state.
Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.
In the function env.step()there is an if statement that checks whether the environment can apply the action generated by the NN to the current state of the application under test. Unfortunately, you can not modify the output dimension of the NN (the action space you are referring to). The only way to always output a "correct" action is to learn an optimal policy, which means training the RL algorithm for hours (or more, as the answer is not obvious and not guaranteed).
Hello!
I have a question.
During the implementation of ARES, did you ever think of trying to implement a way to evaluate an already trained model, and not just continue with the training?
Since in your paper I don't see it mentioned as a limitation.
Besides, at this moment I am trying to implement it, but I have been stuck because of a limitation that may be Stable Baselines3 has to change the action space each time you try to predict an action.
Like this
As you can see, the predict method only receives the observation, not the environment, so modifying the action space is useless.
I just wanted to know if maybe you who developed this tool have any idea or clarification that you can give me, I would appreciate it. 😉
The text was updated successfully, but these errors were encountered: