I have a dataframe with column names, and I want to find the one that contains a certain string, but does not exactly match it. I'm searching for 'spike'
in column names like 'spike-2'
, 'hey spike'
, 'spiked-in'
(the 'spike'
part is always continuous).
I want the column name to be returned as a string or a variable, so I access the column later with df['name']
or df[name]
as normal. I've tried to find ways to do this, to no avail. Any tips?
Just iterate over DataFrame.columns
, now this is an example in which you will end up with a list of column names that match:
import pandas as pd
data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)
spike_cols = [col for col in df.columns if 'spike' in col]
print(list(df.columns))
print(spike_cols)
Output:
['hey spke', 'no', 'spike-2', 'spiked-in']
['spike-2', 'spiked-in']
Explanation:
df.columns
returns a list of column names[col for col in df.columns if 'spike' in col]
iterates over the list df.columns
with the variable col
and adds it to the resulting list if col
contains 'spike'
. This syntax is list comprehension. If you only want the resulting data set with the columns that match you can do this:
df2 = df.filter(regex='spike')
print(df2)
Output:
spike-2 spiked-in
0 1 7
1 2 8
2 3 9