MATLAB: Using textscan and converting cell array in matrix

sasstudent picture sasstudent · Jul 15, 2013 · Viewed 21.3k times · Source

I have a large csv file (should be around 1 million lines) with option data with the following structure (content is altered):

secid, date, days, delta, impl_volatility, impl_strike, impl_premium, dispersion, cp_flag, ticker, index_flag, industry_group
100000, 02/05/1986, 60, -80, 0.270556, 74.2511, 5.2415, 0.021514, C, ASC, 0, 481
100000, 03/05/1986, 30, -40, 0.251556, 74.2571, 6.2415, 0.025524, P, ASC, 0, 481

I have successfully imported a test file using the following:

ftest = fopen('test.csv');
C = textscan(ftest,'%f %s %f %f %f %f %f %f %s %s %f %f','Headerlines',1,'Delimiter',',');
fclose(ftest);

However, C is a cell array and this makes it harder to handle the contents of the file in matlab. It would be easier to have it as a "regular" array (pardon me for not knowing the correct nomenclature, I just started working with matlab).

If I output C, I get:

Columns 1 through 6
[2x1 double]    {2x1 cell}    [2x1 double]    [2x1 double]    [2x1 double]    [2x1 double]
Columns 7 through 12
[2x1 double]    [2x1 double]    {2x1 cell}    {2x1 cell}    [2x1 double]    [2x1 double]

So inside the cell array which is C, there are arrays and cell arrays - arrays for numbers and cell arrays for strings. If I try to check element (1,2), I have to use C{1}(2) but if I want to check element (2,2) I have to use C{2}{2}. Ideally, I would like to access both as C(1,2) and C(2,2). The question is, how do I do this?

I have searched for solutions and found cells2mat but it only works if all content is numeric (I think). I found this solution: Convert cell array of cell arrays to matrix of matrices but horzcat retrieves an error, which I believe may occurr due to the same problem.

Thank you in advance for your time.

Answer

Rody Oldenhuis picture Rody Oldenhuis · Jul 15, 2013

As you have an array containing both numeric and character data, what you want is impossible (and trust me, it would be impractical, too).

Referencing individual numbers in a numeric array is different from referencing whole strings. There is simply no escaping that, nor should there be: you treat flowers differently from how you treat people (I sure hope so anyway).

In MATLAB, a string is an ordinary array, with the difference that each entry of the array does not represent a number, but a character. Referencing individual characters goes the same as referencing numbers in arrays:

>> a = 'my string'
>> a(4)
ans = 
s
>> a+0  % cast to double to show the "true character" of strings
ans =
    109   121    32   115   116   114   105   110   103

However, textscan assumes (rightfully so) that you don't want to do that, but rather you want to extract whole strings from the file. And whole strings should be referenced differently, to indicate that you mean whole strings and not individual characters.

I think you'll find it all a bit more intuitive if you split the results from textscan up into an ordinary numeric array and a cell array of strings, like so:

% load the data
ftest = fopen('test.csv');
C = textscan(ftest,...
    '%f %s %f %f %f %f %f %f %s %s %f %f',...     
    'collectoutput', true,...
    'Delimiter',',\n');
fclose(ftest);

% split into numeric and char arrays
numeric = [C{[1 3 5]}]
alpha   = [C{[2 4]}]

Referencing data in numeric then follows the same rules as any ordinary array, and referencing the strings in alpha then follows the normal cell-referencing rules (as in alpha{2,1} to get '03/05/1986')

EDIT Based no your comments, you want to do the conversion like this:

% Read the data
fid = fopen('test.csv', 'r');
C = textscan(fid,...
    '%f %s %f %f %f %f %f %f %s %s %f %f',...
    'Headerlines', 1,...
    'Delimiter',',');
fclose(fid);

% Delete 10th element ('ASC')
C(10) = [];

% Mass-convert dates to datenums
C{2} = datenum(C{2}, 'dd/mm/yyyy');

% Map 'P' to 1 and 'C' to 2
map('PC') = [1 2];
C{9} = map([C{9}{:}]).';

% Convert whole array to numeric array
C = [C{:}];