I have some data that looks like this:
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
I want to import it into a 3 column data frame, with columns e.g.
a , b, c
1, 1, 0.5
etc
I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?
currently:
data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])
However this is further complicated by the fact my data has a leading space...
I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.
One way might be to use the regex separators permitted by the python engine. For example:
>>> !cat castle.dat
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'],
sep=' |:', engine='python')
>>> df
a b c
0 1 1 0.5
1 1 2 6.5
2 1 3 5.3