import text to pandas with multiple delimiters

CastleH picture CastleH · Oct 24, 2014 · Viewed 43.3k times · Source

I have some data that looks like this:

c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3

I want to import it into a 3 column data frame, with columns e.g.

a , b, c
1,  1, 0.5
etc

I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?

currently:

data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])

However this is further complicated by the fact my data has a leading space...

I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.

Answer

DSM picture DSM · Oct 24, 2014

One way might be to use the regex separators permitted by the python engine. For example:

>>> !cat castle.dat
c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'], 
                     sep=' |:', engine='python')
>>> df
   a  b    c
0  1  1  0.5
1  1  2  6.5
2  1  3  5.3