I have a large CSV file with many columns, something like this:
id, col1, col2, col3, col4, col5
1, a, b, 2, d, e
2, b, c, 4, e, f
3, c, d, 6, f, g
I want to be able to create a dictionary in which only certain columns are used. For example, the dictionary would have the id number, col2, and col3. Additionally, it should only store the rows that have the highest 10 numbers in col2. This is the code I have:
import csv
reader = csv.DictReader(open('SNPs.csv', newline=''), delimiter=',', quotechar='"')
But I do not know how to tell it to ignore certain columns, and I don't think that I can use max() to return multiple values.
max(2, 4) returns 4.
EDIT I tried using Daniel's Code, but for some reason the sort function isn't working correctly. (I also need to use reverse sort instead of sort). It only outputs four different keys, and additionally, they aren't actually sorted in descending numerical order. It also returns the headers as one of the values.
import csv
f = open('SNPs.csv', "rU")
reader = csv.reader(f)
output = [row for row in reader]
output.sort(key=lambda x: x[32], reverse=True)
print dict((row[10], (row[11], row[8], row[32])) for row in output[:10])
col2
doesn't have any numbers. I'll assume you meant col3
.
You can't tell which are the ten highest numbers in col3 until you've read them all. So since you're going to be doing that anyway, you might as well read everything, then extract the top ten afterwards. So you can do something like this:
output = []
for row in reader:
output.append(dict(k, v) for k, v in row if k in ('id', 'col2', 'col3'))
output.sort(key=lambda x: x['col3'])
return output[:10]
Edit Now I see your desired output, you want something completely different to what I imagined. In fact DictReader is completely pointless here, so I'll rewrite with the normal Reader.
f = open('SNPs.csv')
reader = csv.Reader(f, delimiter=',', quotechar='"')
output = [row for row in reader]
output.sort(key=lambda x: x[3])
return dict((row[0], (row[3], row[4])) for row in output[:10])