Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'numpy.datetime64' object has no attribute 'year' when writing to zarr or netcdf #6318

Closed
rsignell-usgs opened this issue Mar 2, 2022 · 4 comments
Labels

Comments

@rsignell-usgs
Copy link

rsignell-usgs commented Mar 2, 2022

What happened?

I have a reproducible notebook where I've loaded a referenceFileSystem dataset into xarray and everything seems fine with time being understood correctly, but when I try to save a subset to zarr or netcdf, I get:

numpy.datetime64' object has no attribute 'year'

I don't understand this since it seems time is always a datetime64 object in xarray, and I've never had this problem before.

What did you expect to happen?

Expected the file to be written as usual without error.

Minimal Complete Verifiable Example

https://nbviewer.org/gist/rsignell-usgs/029b39f0c428b07914f5a6b1129da572

Relevant log output

No response

Anything else we need to know?

I asked the question first over at fsspec/kerchunk#130 and @martindurant thought this looked like an xarray issue, not a kerchunk issue.

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 4.12.14-150.17_5.0.85-cray_ari_c
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.21.0
pandas: 1.4.0
numpy: 1.21.5
scipy: 1.7.3
netCDF4: 1.5.8
pydap: None
h5netcdf: 0.13.1
h5py: 3.6.0
Nio: None
zarr: 2.10.3
cftime: 1.5.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: 0.9.10.0
iris: None
bottleneck: 1.3.2
dask: 2021.12.0
distributed: 2021.12.0
matplotlib: 3.5.1
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.01.0
cupy: None
pint: 0.18
sparse: 0.13.0
setuptools: 59.8.0
pip: 22.0.2
conda: 4.11.0
pytest: None
IPython: 8.0.1
sphinx: 4.4.0

@rsignell-usgs rsignell-usgs added bug needs triage Issue that has not been reviewed by xarray team member labels Mar 2, 2022
@rsignell-usgs
Copy link
Author

rsignell-usgs commented Mar 2, 2022

While I was typing this, @keewis provided a workaround here: fsspec/kerchunk#130 (comment) ! Leaving this open until I know whether this is something best left for users to implement or something to be handled in xarray. #6318

@spencerkclark
Copy link
Member

To be honest I didn't know it was possible to open a Dataset and maintain np.datetime64[us] values. I feel like casting maybe should occur automatically -- we do this already in other contexts, e.g.:

In [3]: da = xr.DataArray([np.datetime64("2000-01-01", "us")], dims=["time"], name="time")

In [4]: da
Out[4]:
<xarray.DataArray 'time' (time: 1)>
array(['2000-01-01T00:00:00.000000000'], dtype='datetime64[ns]')
Dimensions without coordinates: time

I'll try and dig deeper into this in the next few days, but @keewis's workaround should be good in the meantime.

@spencerkclark
Copy link
Member

I did a little more digging. I'm not a backend expert, but I think the issue can be distilled to the following. When passed an ordinary NumPy array as an input, xarray.Variable will automatically cast any np.datetime64 values to nanosecond precision:

In [3]: arr = np.array([np.datetime64("2000-01-01", "us")])

In [4]: xarray.Variable(["time"], arr)
Out[4]:
<xarray.Variable (time: 1)>
array(['2000-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

However, if passed a LazilyIndexedArray, this casting will not occur (note the dtype is still 'datetime64[us]'):

In [5]: lazily_indexed_arr = xarray.core.indexing.LazilyIndexedArray(arr)

In [6]: xarray.Variable(["time"], lazily_indexed_arr)
Out[6]:
<xarray.Variable (time: 1)>
array(['2000-01-01T00:00:00.000000'], dtype='datetime64[us]')

This is based on this code within xarray.backends.zarr.ZarrStore.

The casting does not occur in the Variable constructor, because it requires the type of the array be np.ndarray -- see here. Regardless, even if we relaxed that, _possibly_convert_objects would raise an error, because it is not compatible with LazilyIndexedArray objects.

Is this something that we maybe need to address within the xarray zarr backend? As I understand it, zarr is a bit unusual compared to other storage formats we deal with in that it can store and open np.datetime64 data directly -- normally datetime data starts as some numeric type and is converted to datetime data after going through our decoding pipeline.

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Mar 16, 2022
@kmuehlbauer
Copy link
Contributor

Looks like this has been fixed by #9618. At least @spencerkclark's example works:

arr = np.array([np.datetime64("2000-01-01", "us")])
print(xarray.Variable(["time"], arr))
lazily_indexed_arr = xarray.core.indexing.LazilyIndexedArray(arr)
print(xarray.Variable(["time"], lazily_indexed_arr))

Output:

<xarray.Variable (time: 1)> Size: 8B
array(['2000-01-01T00:00:00.000000'], dtype='datetime64[us]')
<xarray.Variable (time: 1)> Size: 8B
array(['2000-01-01T00:00:00.000000'], dtype='datetime64[us]')

Please reopen or open a new issue with updated examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants