How to write a multidimensional array to a text file?

Ivo Flipse picture Ivo Flipse · Sep 10, 2010 · Viewed 275.7k times · Source

In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.

Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?

This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.

Edit: So I've tried the numpy.savetxt function. Strangely, it gives the following error:

TypeError: float argument required, not numpy.ndarray

I assume that this is because the function doesn't work with multidimensional arrays? Any solutions as I would like them within one file?

Answer

Joe Kington picture Joe Kington · Sep 10, 2010

If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).

If you want it to be human readable, look into numpy.savetxt.

Edit: So, it seems like savetxt isn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:

I just realized that numpy.savetxt chokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.

E.g. This (a 2D array) works fine

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

One workaround is just to break the 3D (or greater) array into 2D slices. E.g.

x = np.arange(200).reshape((4,5,10))
with open('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxt will ignore any lines that start with # (or whichever character is specified by the comments kwarg). (This looks more verbose than it actually is...)

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: {0}\n'.format(data.shape))
    
    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

This yields:

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))
    
# Just to check that they're the same...
assert np.all(new_data == data)