I think this is fundamental to many people who know how to deal with pickle. However, I still can't get it very right after trying for a few hours. I have the following code:
In the first file
import pandas as pd
names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]
records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()
def name_score_function(record):
if record in names:
return(means.loc[record, 'score'])
import dill as pickle
with open('name_model.pkl', 'wb') as file:
pickle.dump(means, file)
The second file
I would like to load what I have in the first file and make the score of a person (i.e. John, Mary, Suzanne) callable via a function name_model(record):
import dill as pickle
B = pickle.load('name_model.pkl')
def name_model(record):
if record in names:
return(means.loc[record, 'score'])
Here it shows the error:
File "names.py", line 21, in <module>
B = pickle.load('name_model.pkl')
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 197, in load
pik = Unpickler(file)
File "/opt/conda/lib/python2.7/site-packages/dill/dill.py", line 356, in __init__
StockUnpickler.__init__(self, *args, **kwds)
File "/opt/conda/lib/python2.7/pickle.py", line 847, in __init__
self.readline = file.readline
AttributeError: 'str' object has no attribute 'readline'
I know the error comes from my lack of understanding of pickle. I would humbly accept your opinions to improve this code. Thank you!!
UPDATE The more specific thing I would like to achieve:
I would like to be able to use the function that I write in the first file and dump it, and then read it in the second file and be able to use this function to query the mean score of any person in the records.
Here is what I have:
import pandas as pd
names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]
records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()
def name_score_function(record):
if record in names:
return(means.loc[record, 'score'])
B = name_score_function(record)
import dill as pickle
with open('name_model.pkl', 'wb') as file:
pickle.dump(B, file)
with open('name_model.pkl', 'rb') as file:
B = pickle.load(f)
def name_model(record):
return B(record)
print(name_model("John"))
As I execute this code, I have this error File "test.py", line 13, in <module>
B = name_score_function(record)
NameError: name 'record' is not defined
I highly appreciate your assistance and patience.
Thank you. It looks like the following can solve the problem.
import pandas as pd
names = ["John", "Mary", "Mary", "Suzanne", "John", "Suzanne"]
scores = [80, 90, 90, 92, 95, 100]
records = pd.DataFrame({"name": names, "score": scores})
means = records.groupby('name').mean()
import dill as pickle
with open('name_model.pkl', 'wb') as file:
pickle.dump(means, file)
with open('name_model.pkl', 'rb') as file:
B = pickle.load(file)
def name_score_function(record):
if record in names:
return(means.loc[record, 'score'])
print(name_score_function("John"))