I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :
import tarfile
import numpy as np
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f=tar.extractfile(member)
content = f.read()
Data = np.loadtxt(content)
the error is as follows :
Traceback (most recent call last):
File "dataExtPlot.py", line 21, in <module>
content = f.read()
AttributeError: 'NoneType' object has no attribute 'read'
also, Is there any other method to do this task ?
The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.
One possible solution is to skip over the None results:
tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
f = tar.extractfile(member)
if f is not None:
content = f.read()