I have a 2D numpy
array. Is there a way to create a view onto it that would include the first k
rows and all columns?
The point is to avoid copying the underlying data (the array is so large that making partial copies is not feasible.)
Sure, just index it as you normally would. E.g. y = x[:k, :]
This will return a view into the original array. No data will be copied, and any updates made to y
will be reflected in x
and vice versa.
Edit:
I commonly work with >10GB 3D arrays of uint8's, so I worry about this a lot... Numpy can be very efficient at memory management if you keep a few things in mind. Here are a few tips on avoiding making copies of arrays in memory:
Use +=
, -=
, *=
, etc to avoid making a copy of the array. E.g. x += 10
will modify the array in place, while x = x + 10
will make a copy and modify it. (also, have a look at numexpr)
If you do want to make a copy with x = x + 10
, be aware that x = x + 10.0
will cause x
to automatically be up-casted to a floating point array, if it wasn't already. However, x += 10.0
, where x
is an integer array, will cause the 10.0
to be down-casted to an int of the same precision as the array, instead.
Additionally, many numpy functions take an out
parameter, so you can do things like np.abs(x, x)
to take the absolute value of x
in-place.
As a second edit, here's few more tips on views vs. copies with numpy arrays:
Unlike python lists, y = x[:]
does not return a copy, it returns a view. If you do want a copy (which will, of course, double the amount of memory you're using) use y = x.copy()
You'll often hear about "fancy indexing" of numpy arrays. Using a list (or integer array) as an index is "fancy indexing". It can be very useful, but copies the data.
As an example of this: y = x[[0, 1, 2], :]
returns a copy, while y = x[:3,:]
would return a view.
Even really crazy indexing like x[4:100:5, :-10:-1, None]
is "normal" indexing and will return a view, though, so don't be afraid to use all kinds of slicing tricks on large arrays.
x.astype(<dtype>)
will return a copy of the data as the new type, whilex.view(<dtype>)
will return a view.
Be careful with this, however... It's extremely powerful and useful, but you need to understand how the underlying data is stored in memory. If you have an array of floats, and view them as ints, (or vice versa) numpy will interpret the underlying bits of the array as ints.
For example, this means that 1.0
as a 64bit float on a little-endian system will be 4607182418800017408
when viewed as a 64bit int, and an array of [ 0, 0, 0, 0, 0, 0, 240, 63]
if viewed as a uint8. This is really nice when you need to do bit-twiddling of some sort on large arrays, though... You have low level control over how the memory buffer is interpreted.