Excel columns comparison using python code

Lakshmi KrishnaKumaar picture Lakshmi KrishnaKumaar · May 15, 2017 · Viewed 15.8k times · Source

I am working with excel for comparing three columns: my idea is to compare two columns of data with a third column as a array like each value in the 3rd column should be compared with every row of the first and second column and want to extract only those rows where the first and second column's data is present in the third column I used this python command

if([x in x,y for datafile] == [x in x for file) and [y in x,y for datafile] == [x in x for file]): 
    print x,y
else:
    print none        

this gave me an error as syntax error

I have converted my first two columns into a tuple using the zip function the x,y corresponds to the values in the tuple

Col_1 ||  Col_2    ||   file
Abc   |    Abk     |    cnl
Nck   |    Nck     |    Abk
xkl   |    cnl     |    Abc  
mzn   |    mzn     |  

this I have combined as datafile ((Abc,Abk),(Nck,Nck),(xkl,cnl),(mzn,mzn))

Note: my column 3 has smaller values than col 1&2. I have over 100k values to compare

I want a working python program for this query

if [x for x,y in mydata if x == genelist and
y for x,y in mydata if y == genelist]:
    print (x,y)
else: 

can someone correct the syntax error in the above code here

mydata('gene1,genea','gene2,geneb''gene3,genec') and genelist ('genea','geneb','genec') 

when I use the code without if statement it prints me "[]" I don't know what's wrong here

Answer

Tiny.D picture Tiny.D · May 15, 2017

You could use pandas.Series.isin to filter it:

For your excel data (eg:comparison.xlsx) :

enter image description here

Use:

import pandas as pd
df = pd.read_excel('comparison.xlsx')
result = df[df['finaldata1'].isin(list(df['check'])) & df['finaldata2'].isin(list(df['check']))]
result

it will give you:

    finaldata1  finaldata2  check
0   Abc         Abk         cnl

as Abc and Abk is in column file.

Update: Write result to excel file:

from pandas import ExcelWriter

writer = ExcelWriter('PythonExport.xlsx')
result.to_excel(writer,'Sheet1',index=False)
writer.save()

The result will be write into excel file PythonExport.xlsx:

enter image description here