i created a corpus file then stored in a pickle file. my messages file is a collection of different news articles dataframe.
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
ps = PorterStemmer()
corpus = []
for i in range(0, len(messages)):
review = re.sub('[^a-zA-Z]', ' ', messages['text'][i])
review = review.lower()
review = review.split()
review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
review = ' '.join(review)
#print(i)
corpus.append(review)
import pickle
with open('corpus.pkl', 'wb') as f:
pickle.dump(corpus, f)
same code I ran on my laptop (jupyter notebook) and on google colab.
corpus.pkl => Google colab, downloaded with the following code:
from google.colab import files
files.download('corpus.pkl')
corpus1.pkl => saved from jupyter notebook code.
now When I run this code:
import pickle
with open('corpus.pkl', 'rb') as f: # google colab
corpus = pickle.load(f)
I get the following error:
UnpicklingError: pickle data was truncated
But this works fine:
import pickle
with open('corpus1.pkl', 'rb') as f: # jupyter notebook saved
corpus = pickle.load(f)
The only difference between both is that corpus1.pkl
is run and saved through Jupyter notebook (on local) and corpus.pkl
is saved on google collab and downloaded.
Could anybody tell me why is this happening?
for reference..
corpus.pkl => 36 MB
corpus1.pkl => 50.5 MB
i would use pickle file created by my local machine only, that works properly