import text to pandas with multiple delimiters

Question 1

import text to pandas with multiple delimiters

python import pandas delimited-text

CastleH · Oct 24, 2014 · Viewed 43.3k times · Source

Answer

Answer

One way might be to use the regex separators permitted by the python engine. For example:

>>> !cat castle.dat
c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'], 
                     sep=' |:', engine='python')
>>> df
   a  b    c
0  1  1  0.5
1  1  2  6.5
2  1  3  5.3

Question 2

I have some data that looks like this:

c stuff
c more header
c begin data         
 1 1:.5
 1 2:6.5
 1 3:5.3

I want to import it into a 3 column data frame, with columns e.g.

a , b, c
1,  1, 0.5
etc

I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?

currently:

data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])

However this is further complicated by the fact my data has a leading space...

I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.

import text to pandas with multiple delimiters

Answer

Related questions