Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorboard metrics not loaded properly #6907

Open
timr1101 opened this issue Sep 6, 2024 · 10 comments
Open

Tensorboard metrics not loaded properly #6907

timr1101 opened this issue Sep 6, 2024 · 10 comments

Comments

@timr1101
Copy link

timr1101 commented Sep 6, 2024

I have the problem that in Tensorboard the metrics are not loaded correctly (the column is always empty), although the scalars are saved correctly. I am working with torch.utils.tensorboard.

tensorboard_metrics

Relevant code:

writer = SummaryWriter(log_dir=f'./logs/studies/{study_name}/')

In the training loop:
writer.add_scalar(tag='validation/min_loss', scalar_value=min_val_loss, global_step=trial.number)

Add the hyperparameter to the summary writer (args_dict is a dictionary with all hyperparameters)
writer.add_hparams(hparam_dict=args_dict, metric_dict={'validation/min_loss': min_val_loss}, run_name=run_name)
writer.close()

@JamesHollyer
Copy link
Contributor

Are the metrics showing up in the Time Series or Scalar tabs? Did you try selecting the "show metrics" check boxes?

@timr1101
Copy link
Author

The scalars associated with the metrics are loaded correctly in both the TIME SERIES and SCALARS tabs. The only problem is that no metrics are displayed in the HPARAMS tab. When I select the "show metrics" checkboxes, a completely empty chart pops up.

@JamesHollyer
Copy link
Contributor

Wow that is strange! I do not see why that would happen and I cannot seem to reproduce it. Is this happening with other logs or just this one?

@timr1101
Copy link
Author

timr1101 commented Sep 18, 2024

Yes, it's weird. It doesn't seem to be a problem only with these specific logs. I've also used other scalars as metrics, but that didn't change the result. It is perhaps also noteworthy that I encountered exactly the same problem with a completely different implementation, namely the code from the Official Guide to Hyperparameter Optimization with tensorboard (this is a tensorflow implementation). The scalars were displayed correctly in the TIME SERIES and SCALARS tab, but the column of the corresponding metric „Accuracy“ in the HPARAMS tab remained empty.

IMG_0258

Related code (from the official guide):


import tensorflow as tf
from tensorboard.plugins.hparams import api as hp


fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))

METRIC_ACCURACY = 'accuracy'

with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
  hp.hparams_config(
    hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
    metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
  )

def train_test_model(hparams):
  model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
    tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
    tf.keras.layers.Dense(10, activation=tf.nn.softmax),
  ])
  model.compile(
      optimizer=hparams[HP_OPTIMIZER],
      loss='sparse_categorical_crossentropy',
      metrics=['accuracy'],
  )

  model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
  _, accuracy = model.evaluate(x_test, y_test)
  return accuracy

def run(run_dir, hparams):
  with tf.summary.create_file_writer(run_dir).as_default():
    hp.hparams(hparams)  # record the values used in this trial
    accuracy = train_test_model(hparams)
    tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)


session_num = 0

for num_units in HP_NUM_UNITS.domain.values:
  for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
    for optimizer in HP_OPTIMIZER.domain.values:
      hparams = {
          HP_NUM_UNITS: num_units,
          HP_DROPOUT: dropout_rate,
          HP_OPTIMIZER: optimizer,
      }
      run_name = "run-%d" % session_num
      print('--- Starting trial: %s' % run_name)
      print({h.name: hparams[h] for h in hparams})
      run('logs/hparam_tuning/' + run_name, hparams)
      session_num += 1

@JamesHollyer
Copy link
Contributor

Is it possible for you to send me your log files?

@timr1101
Copy link
Author

Sure. But since I'm currently on vacation, I can't do this until the beginning of next week.

@JamesHollyer
Copy link
Contributor

Hey Tim, thanks for sending me your logs. Unfortunately, I still cannot reproduce the issue. I ran these commands:

pip install --upgrade pip
pip install tensorboard
tensorboard --logdir ./your/log/dir

Screenshot 2024-09-25 at 11 47 30 AM

What version of TensorBoard are you running?

pip freeze | grep tensorboard
tensorboard==2.8.0
tensorboard-data-server==0.6.1

@timr1101
Copy link
Author

Hey James, the tensorboard versions were indeed the deciding factor.
I had the newer versions

tensorboard 2.17.1
tensorboard-data-server 0.7.2

installed. Downgrading to

tensorboard 2.8.0
tensorboard-data-server 0.6.1

solved the problem and all metrics were displayed correctly. Thank you very much for your help!
One more note: I also installed the today released version

tensorboard 2.18.0

and

tensorboard-data-server 0.7.2

However, the problem still exists for these.

@lebeand
Copy link

lebeand commented Nov 26, 2024

Thanks! Had the same issue here.
Works for me with tensorboard==2.16.2 (and tensorboard-data-server==0.7.2).

@kevinunger
Copy link

kevinunger commented Dec 17, 2024

Can this issue be re-opened? The issue still persists with the current version 2.18.0
Works for me in 2.16.2, not in 2.17.0

@timr1101 timr1101 reopened this Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants