Python Pandas add Filename Column CSV

specmer picture specmer · Jan 25, 2017 · Viewed 17.6k times · Source

My python code works correctly in the below example. My code combines a directory of CSV files and matches the headers. However, I want to take it a step further - how do I add a column that appends the filename of the CSV that was used?

import pandas as pd
import glob

globbed_files = glob.glob("*.csv") #creates a list of all csv files

data = [] # pd.concat takes a list of dataframes as an agrument
for csv in globbed_files:
    frame = pd.read_csv(csv)
    data.append(frame)

bigframe = pd.concat(data, ignore_index=True) #dont want pandas to try an align row indexes
bigframe.to_csv("Pandas_output2.csv")

Answer

Mike Müller picture Mike Müller · Jan 25, 2017

This should work:

import os

for csv in globbed_files:
    frame = pd.read_csv(csv)
    frame['filename'] = os.path.basename(csv)
    data.append(frame)

frame['filename'] creates a new column named filename and os.path.basename() turns a path like /a/d/c.txt into the filename c.txt.