You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I am trying to fine-tune the mT5 dataset on a custom dataset on a TPU on GCP. I am following carefully the process described in this repository however I have a tensorflow-related error.
2022-07-13 22:29:42.556669: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-07-13 22:29:42.556729: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
File "/home/.local/bin/t5_mesh_transformer", line 5, in <module>
from t5.models.mesh_transformer_main import console_entry_point
File "/home/.local/lib/python3.9/site-packages/t5/__init__.py", line 17, in <module>
import t5.data
File "/home/.local/lib/python3.9/site-packages/t5/data/__init__.py", line 17, in <module>
from t5.data.dataset_providers import *
File "/home/.local/lib/python3.9/site-packages/t5/data/dataset_providers.py", line 28, in <module>
import seqio
File "/home/.local/lib/python3.9/site-packages/seqio/__init__.py", line 18, in <module>
from seqio.dataset_providers import *
File "/home/.local/lib/python3.9/site-packages/seqio/dataset_providers.py", line 34, in <module>
from seqio import utils
File "/home/.local/lib/python3.9/site-packages/seqio/utils.py", line 25, in <module>
import tensorflow.compat.v2 as tf
File "/home/.local/lib/python3.9/site-packages/tensorflow/__init__.py", line 37, in <module>
from tensorflow.python.tools import module_util as _module_util
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/__init__.py", line 42, in <module>
from tensorflow.python import data
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/__init__.py", line 21, in <module>
from tensorflow.python.data import experimental
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/experimental/__init__.py", line 95, in <module>
from tensorflow.python.data.experimental import service
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/experimental/service/__init__.py", line 387, in <module>
from tensorflow.python.data.experimental.ops.data_service_ops import distribute
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/experimental/ops/data_service_ops.py", line 26, in <module>
from tensorflow.python.data.ops import dataset_ops
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 31, in <module>
from tensorflow.python.data.ops import iterator_ops
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 36, in <module>
from tensorflow.python.training.saver import BaseSaverBuilder
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/training/saver.py", line 51, in <module>
from tensorflow.python.training.saving import saveable_object_util
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/training/saving/saveable_object_util.py", line 20, in <module>
from tensorflow.python.eager import def_function
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 75, in <module>
from tensorflow.python.eager import function as function_lib
File "/home/.local/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 35, in <module>
from tensorflow.python.eager import backprop
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 786, in exec_module
File "<frozen importlib._bootstrap_external>", line 918, in get_code
File "<frozen importlib._bootstrap_external>", line 587, in _compile_bytecode
EOFError: marshal data too short
To Reproduce
Steps to reproduce the behavior:
create a VM
Create a TPU
create a bucket and upload the .txt corpus on which I will train the model
Describe the bug
I am trying to fine-tune the mT5 dataset on a custom dataset on a TPU on GCP. I am following carefully the process described in this repository however I have a tensorflow-related error.
To Reproduce
Steps to reproduce the behavior:
Expected behaviour
the training on the TPU should start
Any help would be appreciated.
Thank you
The text was updated successfully, but these errors were encountered: