Customizing the separator in pandas read_csv

Peaceful picture Peaceful · Dec 20, 2016 · Viewed 39.5k times · Source

I am reading many different data files into various pandas dataframes. The columns in these datafiles are separated by spaces. However, for each file, the number of spaces is different (for some of them, there is only one space, for others, there are two spaces and so on). Thus, every time I import the file, I have to manually go to that file and see the number of spaces that have been used and give those many number of spaces in sep:

import pandas as pd
df = pd.read_csv('myfile.dat', sep = '    ')

Is there any way I can tell pandas to assume "any number of spaces" as the separator? Also, is there any way I can tell pandas to use either tab (\t) or spaces as the separator?

Answer

Ted Petrou picture Ted Petrou · Dec 20, 2016

Yes, you can use a simple regular expression like sep='\s+' to denote one or more spaces.