Very simple question but I can't find the answer online. I have a Dataset
and I just want to add a named DataArray
to it. Something like dataset.add({"new_array": new_data_array})
. I know about merge
and update
and concatenate
, but my understanding is that merge
is for merging two or more Dataset
s and concatenate
is for concatenating two or more DataArray
s to form another DataArray
, and I haven't quite fully understood update
yet. I've tried dataset.update({"new_array": new_data_array})
but I get the following error.
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I've also tried dataset["new_array"] = new_data_array
and I get the same error.
I've now found out that the problem is that some of my coordinates have duplicate values, which I didn't know about. Coordinates are used as index, so Xarray gets confused (understandably) when trying to combine the shared coordinates. Below is an example that works.
names = ["joaquin", "manolo", "xavier"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)
Output:
<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
* name (name) <U7 'joaquin' 'manolo' 'xavier'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[ 55, 63, 250, 211],
[204, 151, 164, 237],
[182, 24, 211, 12],
[183, 220, 35, 78]],
[[208, 7, 91, 114],
[195, 30, 108, 130],
[ 61, 224, 105, 125],
[ 65, 1, 132, 137]],
[[ 52, 137, 62, 206],
[188, 160, 156, 126],
[145, 223, 103, 240],
[141, 38, 43, 68]]], dtype=uint8)
Coordinates:
* name (name) <U7 'joaquin' 'manolo' 'xavier'
* row (row) int64 0 1 2 3
* column (column) int64 0 1 2 3
======
<xarray.Dataset>
Dimensions: (column: 4, name: 3, row: 4)
Coordinates:
* name (name) object 'joaquin' 'manolo' 'xavier'
* row (row) int64 0 1 2 3
* column (column) int64 0 1 2 3
Data variables:
number (name) int64 23 98 23
mm (name, row, column) uint8 55 63 250 211 204 151 164 237 182 24 ...
The above code uses names
as the index. If I change the code a little bit, so that names
has a duplicate, say names = ["joaquin", "manolo", "joaquin"]
, then I get an InvalidIndexError
.
Code:
names = ["joaquin", "manolo", "joaquin"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)
Output:
<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
* name (name) <U7 'joaquin' 'manolo' 'joaquin'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[247, 3, 20, 141],
[ 54, 111, 224, 56],
[144, 117, 131, 192],
[230, 44, 174, 14]],
[[225, 184, 170, 248],
[ 57, 105, 165, 70],
[220, 228, 238, 17],
[ 90, 118, 87, 30]],
[[158, 211, 31, 212],
[ 63, 172, 190, 254],
[165, 163, 184, 22],
[ 49, 224, 196, 244]]], dtype=uint8)
Coordinates:
* name (name) <U7 'joaquin' 'manolo' 'joaquin'
* row (row) int64 0 1 2 3
* column (column) int64 0 1 2 3
======
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-12-50863379cefe> in <module>()
8 print("======")
9 n_dataset = n.rename("number").to_dataset()
---> 10 n_dataset["mm"] = mm
11 print(n_dataset)
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in __setitem__(self, key, value)
536 raise NotImplementedError('cannot yet use a dictionary as a key '
537 'to set Dataset values')
--> 538 self.update({key: value})
539
540 def __delitem__(self, key):
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in update(self, other, inplace)
1434 dataset.
1435 """
-> 1436 variables, coord_names, dims = dataset_update_method(self, other)
1437
1438 return self._replace_vars_and_dims(variables, coord_names, dims,
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in dataset_update_method(dataset, other)
492 priority_arg = 1
493 indexes = dataset.indexes
--> 494 return merge_core(objs, priority_arg=priority_arg, indexes=indexes)
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
373 coerced = coerce_pandas_values(objs)
374 aligned = deep_align(coerced, join=join, copy=False, indexes=indexes,
--> 375 skip_single_target=True)
376 expanded = expand_variable_dicts(aligned)
377
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in deep_align(list_of_variable_maps, join, copy, indexes, skip_single_target)
162
163 aligned = partial_align(*targets, join=join, copy=copy, indexes=indexes,
--> 164 skip_single_target=skip_single_target)
165
166 for key, aligned_obj in zip(keys, aligned):
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in partial_align(*objects, **kwargs)
122 valid_indexers = dict((k, v) for k, v in joined_indexes.items()
123 if k in obj.dims)
--> 124 result.append(obj.reindex(copy=copy, **valid_indexers))
125
126 return tuple(result)
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, **kw_indexers)
1216
1217 variables = alignment.reindex_variables(
-> 1218 self.variables, self.indexes, indexers, method, tolerance, copy=copy)
1219 return self._replace_vars_and_dims(variables)
1220
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in reindex_variables(variables, indexes, indexers, method, tolerance, copy)
234 target = utils.safe_cast_to_index(indexers[name])
235 indexer = index.get_indexer(target, method=method,
--> 236 **get_indexer_kwargs)
237
238 to_shape[name] = len(target)
/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2080
2081 if not self.is_unique:
-> 2082 raise InvalidIndexError('Reindexing only valid with uniquely'
2083 ' valued Index objects')
2084
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
So it's not a bug in Xarray as such. Nevertheless, I wasted many hours trying to find this bug, and I wish the error message was more informative. I hope the Xarray collaborators will fix this soon. (Put in a uniqueness check on the coordinates before attempting to merge.)
In any case, the method provided by my answer below still works.
You need to make sure that the dimensions of your new DataArray are the same as in your dataset. Then the following should work:
dataset['new_array_name'] = new_array
Here is a complete example to try it out:
# Create some dimensions
x = np.linspace(-10,10,10)
y = np.linspace(-20,20,20)
(yy, xx) = np.meshgrid(y,x)
# Make two different DataArrays with equal dimensions
var1 = xray.DataArray(np.random.randn(len(x),len(y)),coords=[x, y],dims=['x','y'])
var2 = xray.DataArray(-xx**2+yy**2,coords=[x, y],dims=['x','y'])
# Save one DataArray as dataset
ds = var1.to_dataset(name = 'var1')
# Add second DataArray to existing dataset (ds)
ds['var2'] = var2