I want to save a 2D array to a CSV file with row and column "header" information (like a table). I know that I could use the header argument to numpy.savetxt to save the column names, but is there any easy way to also include some other array (or list) as the first column of data (like row titles)?
Below is an example of how I currently do it. Is there a better way to include those row titles, perhaps some trick with savetxt I'm unaware of?
import csv
import numpy as np
data = np.arange(12).reshape(3,4)
# Add a '' for the first column because the row titles go there...
cols = ['', 'col1', 'col2', 'col3', 'col4']
rows = ['row1', 'row2', 'row3']
with open('test.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(cols)
for row_title, data_row in zip(rows, data):
writer.writerow([row_title] + data_row.tolist())
Maybe you'd prefer to do something like this:
# Column of row titles
rows = np.array(['row1', 'row2', 'row3'], dtype='|S20')[:, np.newaxis]
with open('test.csv', 'w') as f:
np.savetxt(f, np.hstack((rows, data)), delimiter=', ', fmt='%s')
This is implicitly converting data
to an array of strings, and takes about 200 ms for every million items in my computer.
The dtype '|S20'
means strings of twenty characters. If it's too low, your numbers will get chopped:
>>> np.asarray([123], dtype='|S2')
array(['12'],
dtype='|S2')
Another option, that from my limited testing is slower, but gives you a lot more control and doesn't have the chopping issue would be using np.char.mod
, like
# Column of row titles
rows = np.array(['row1', 'row2', 'row3'])[:, np.newaxis]
str_data = np.char.mod("%10.6f", data)
with open('test.csv', 'w') as f:
np.savetxt(f, np.hstack((rows, str_data)), delimiter=', ', fmt='%s')