I have several csv files in a single folder and I want to open them all in one dataframe and insert a new column with the associated filename. So far I've coded the following:
import pandas as pd
import glob, os
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('path/*.csv'))))
df['filename']= os.path.basename(csv)
df
This gives me the dataframe I want but in the new column 'filename' it's only listing the last filename in the folder for every row. I'm looking for each row to be populated with it's associated csv file. Not just the last file in the folder.
Any assistance for this newbie is much appreciated.
I think you need assign
for add new column in loop
, also parameter ignore_index=True
was added to concat
for remove duplicates in index
:
Files for test are a.csv, b.csv, c.csv.
import pandas as pd
import glob, os
files = glob.glob('files/*.csv')
print (files)
['files\\a.csv', 'files\\b.csv', 'files\\c.csv']
files = glob.glob('files/*.csv')
print (files)
['files\\a.csv', 'files\\b.csv', 'files\\c.csv']
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])
print (df)
a b c d New
0 0 1 2 5 a.csv
1 1 5 8 3 a.csv
2 0 9 6 5 b.csv
3 1 6 4 2 b.csv
4 0 7 1 7 c.csv
5 1 3 2 6 c.csv
files = glob.glob('files/*.csv')
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp).split('.')[0]) for fp in files])
print (df)
a b c d New
0 0 1 2 5 a
1 1 5 8 3 a
2 0 9 6 5 b
3 1 6 4 2 b
4 0 7 1 7 c
5 1 3 2 6 c