I would like to use tensorflow to generate text and have been modifying the LSTM tutorial (https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) code to do this, however my initial solution seems to generate nonsense, even after training for a long time, it does not improve. I fail to see why. The idea is to start with a zero matrix and then generate one word at a time.
This is the code, to which I've added the two functions below https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py
The generator looks as follows
def generate_text(session,m,eval_op):
state = m.initial_state.eval()
x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)
output = str()
for i in xrange(m.batch_size):
for step in xrange(m.num_steps):
try:
# Run the batch
# targets have to bee set but m is the validation model, thus it should not train the neural network
cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
{m.input_data: x, m.targets: x, m.initial_state: state})
# Sample a word-id and add it to the matrix and output
word_id = sample(probabilities[0,:])
output = output + " " + reader.word_from_id(word_id)
x[i][step] = word_id
except ValueError as e:
print("ValueError")
print(output)
I have added the variable "probabilities" to the ptb_model and it is simply a softmax over the logits.
self._probabilities = tf.nn.softmax(logits)
And the sampling:
def sample(a, temperature=1.0):
# helper function to sample an index from a probability array
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
return np.argmax(np.random.multinomial(1, a, 1))
I have been working toward the exact same goal, and just got it to work. You have many of the right modifications here, but I think you've missed a few steps.
First, for generating text you need to create a different version of the model which represents only a single timestep. The reason is that we need to sample each output y before we can feed it into the next step of the model. I did this by making a new config which sets num_steps
and batch_size
both equal to 1.
class SmallGenConfig(object):
"""Small config. for generation"""
init_scale = 0.1
learning_rate = 1.0
max_grad_norm = 5
num_layers = 2
num_steps = 1 # this is the main difference
hidden_size = 200
max_epoch = 4
max_max_epoch = 13
keep_prob = 1.0
lr_decay = 0.5
batch_size = 1
vocab_size = 10000
I also added a probabilities to the model with these lines:
self._output_probs = tf.nn.softmax(logits)
and
@property
def output_probs(self):
return self._output_probs
Then, there are a few differences in my generate_text()
function. The first one is that I load saved model parameters from disk using the tf.train.Saver()
object. Note that we do this after instantiating the PTBModel with the new config from above.
def generate_text(train_path, model_path, num_sentences):
gen_config = SmallGenConfig()
with tf.Graph().as_default(), tf.Session() as session:
initializer = tf.random_uniform_initializer(-gen_config.init_scale,
gen_config.init_scale)
with tf.variable_scope("model", reuse=None, initializer=initializer):
m = PTBModel(is_training=False, config=gen_config)
# Restore variables from disk.
saver = tf.train.Saver()
saver.restore(session, model_path)
print("Model restored from file " + model_path)
The second difference is that I get the lookup table from ids to word strings (I had to write this function, see the code below).
words = reader.get_vocab(train_path)
I set up the initial state the same way you do, but then I set up the initial token in a different manner. I want to use the "end of sentence" token so that I'll start my sentence with the right types of words. I looked through the word index and found that <eos>
happens to have index 2 (deterministic) so I just hard-coded that in. Finally, I wrap it in a 1x1 Numpy Matrix so that it is the right type for the model inputs.
state = m.initial_state.eval()
x = 2 # the id for '<eos>' from the training set
input = np.matrix([[x]]) # a 2D numpy matrix
Finally, here's the part where we generate sentences. Note that we tell session.run()
to compute the output_probs
and the final_state
. And we give it the input and the state. In the first iteration the input is <eos>
and the state is the initial_state
, but on subsequent iterations we give as input our last sampled output, and we pass the state along from the last iteration. Note also that we use the words
list to look up the word string from the output index.
text = ""
count = 0
while count < num_sentences:
output_probs, state = session.run([m.output_probs, m.final_state],
{m.input_data: input,
m.initial_state: state})
x = sample(output_probs[0], 0.9)
if words[x]=="<eos>":
text += ".\n\n"
count += 1
else:
text += " " + words[x]
# now feed this new word as input into the next iteration
input = np.matrix([[x]])
Then all we have to do is print out the text we accumulated.
print(text)
return
That's it for the generate_text()
function.
Finally, let me show you the function definition for get_vocab()
, which I put in reader.py.
def get_vocab(filename):
data = _read_words(filename)
counter = collections.Counter(data)
count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))
words, _ = list(zip(*count_pairs))
return words
The last thing you need to do is to be able to save the model after training it, which looks like
save_path = saver.save(session, "/tmp/model.ckpt")
And that's the model that you'll load from disk later when generating text.
There was one more problem: I found that sometimes the probability distribution produced by the Tensorflow softmax function didn't sum exactly to 1.0. When the sum was larger than 1.0, np.random.multinomial()
throws an error. So I had to write my own sampling function, which looks like this
def sample(a, temperature=1.0):
a = np.log(a) / temperature
a = np.exp(a) / np.sum(np.exp(a))
r = random.random() # range: [0,1)
total = 0.0
for i in range(len(a)):
total += a[i]
if total>r:
return i
return len(a)-1
When you put all this together, the small model was able to generate me some cool sentences. Good luck.