pickle data was truncated

omkar patil picture omkar patil · May 10, 2020 · Viewed 9.3k times · Source

i created a corpus file then stored in a pickle file. my messages file is a collection of different news articles dataframe.

from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
ps = PorterStemmer()
corpus = []
for i in range(0, len(messages)):
    review = re.sub('[^a-zA-Z]', ' ', messages['text'][i])
    review = review.lower()
    review = review.split()

    review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
    review = ' '.join(review)
    #print(i)
    corpus.append(review)

import pickle
with open('corpus.pkl', 'wb') as f:
   pickle.dump(corpus, f)

same code I ran on my laptop (jupyter notebook) and on google colab.

corpus.pkl => Google colab, downloaded with the following code:

from google.colab import files
files.download('corpus.pkl')

corpus1.pkl => saved from jupyter notebook code.

now When I run this code:

import pickle
with open('corpus.pkl', 'rb') as f:   # google colab
    corpus = pickle.load(f)

I get the following error:

UnpicklingError: pickle data was truncated

But this works fine:

import pickle
with open('corpus1.pkl', 'rb') as f:  # jupyter notebook saved
    corpus = pickle.load(f)

The only difference between both is that corpus1.pkl is run and saved through Jupyter notebook (on local) and corpus.pkl is saved on google collab and downloaded.

Could anybody tell me why is this happening?

for reference..

corpus.pkl  => 36 MB
corpus1.pkl => 50.5 MB

Answer

omkar patil picture omkar patil · Sep 7, 2020

i would use pickle file created by my local machine only, that works properly