I am working with excel for comparing three columns: my idea is to compare two columns of data with a third column as a array like each value in the 3rd column should be compared with every row of the first and second column and want to extract only those rows where the first and second column's data is present in the third column I used this python command
if([x in x,y for datafile] == [x in x for file) and [y in x,y for datafile] == [x in x for file]):
print x,y
else:
print none
this gave me an error as syntax error
I have converted my first two columns into a tuple using the zip
function the x,y corresponds to the values in the tuple
Col_1 || Col_2 || file
Abc | Abk | cnl
Nck | Nck | Abk
xkl | cnl | Abc
mzn | mzn |
this I have combined as datafile ((Abc,Abk),(Nck,Nck),(xkl,cnl),(mzn,mzn))
Note: my column 3 has smaller values than col 1&2. I have over 100k values to compare
I want a working python program for this query
if [x for x,y in mydata if x == genelist and
y for x,y in mydata if y == genelist]:
print (x,y)
else:
can someone correct the syntax error in the above code here
mydata('gene1,genea','gene2,geneb''gene3,genec') and genelist ('genea','geneb','genec')
when I use the code without if statement it prints me "[]" I don't know what's wrong here
You could use pandas.Series.isin to filter it:
For your excel data (eg:comparison.xlsx
) :
Use:
import pandas as pd
df = pd.read_excel('comparison.xlsx')
result = df[df['finaldata1'].isin(list(df['check'])) & df['finaldata2'].isin(list(df['check']))]
result
it will give you:
finaldata1 finaldata2 check
0 Abc Abk cnl
as Abc
and Abk
is in column file
.
Update: Write result to excel file:
from pandas import ExcelWriter
writer = ExcelWriter('PythonExport.xlsx')
result.to_excel(writer,'Sheet1',index=False)
writer.save()
The result will be write into excel file PythonExport.xlsx
: