Importing large tab-delimited .txt file into Python

user2464402 picture user2464402 · Jun 7, 2013 · Viewed 69.7k times · Source

I have a tab delimited .txt file that I'm trying to import into a matrix array in Python of the same format as the text file is as shown below:

123088 266 248 244 266 244 277

123425 275 244 241 289 248 231

123540 156 654 189 354 156 987

Note there are many, many more rows of the stuff above (roughly 200) that I want to pass into Python and maintain the same formatting when creating a matrix array from it.

The current code that I have for this is:

d = {}
with open('file name', 'rb') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='\t')
    for row in csv_reader:
        d[row[0]] = row[1:]

Which it slightly does what I need it to do, but not my target goal for it. I want to finish code that I can type in print(d[0,3]) and it will spit out 248.

Answer

Jeff Tratner picture Jeff Tratner · Jun 8, 2013

First, you are loading it into a dictionary, which is not going to get the list of lists that you want.

It's dead simple to use the CSV module to generate a list of lists like this:

import csv
with open(path) as f:
    reader = csv.reader(f, delimiter="\t")
    d = list(reader)
print d[0][2] # 248

That would give you a list of lists of strings, so if you wanted to get numbers, you'd have to convert to int.

That said, if you have a large array (or are doing any kind of numeric calculations), you should consider using something like NumPy or pandas. If you wanted to use NumPy, you could do

import numpy as np
d = np.loadtxt(path, delimiter="\t")
print d[0,2] # 248

As a bonus, NumPy arrays allow you to do quick vector/matrix operations. (Also, note that d[0][2] would work with the NumPy array too).