TensorFlow using LSTMs for generating text

seberik picture seberik · Apr 13, 2016 · Viewed 8.2k times · Source

I would like to use tensorflow to generate text and have been modifying the LSTM tutorial (https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html#recurrent-neural-networks) code to do this, however my initial solution seems to generate nonsense, even after training for a long time, it does not improve. I fail to see why. The idea is to start with a zero matrix and then generate one word at a time.

This is the code, to which I've added the two functions below https://tensorflow.googlesource.com/tensorflow/+/master/tensorflow/models/rnn/ptb/ptb_word_lm.py

The generator looks as follows

def generate_text(session,m,eval_op):

    state = m.initial_state.eval()

    x = np.zeros((m.batch_size,m.num_steps), dtype=np.int32)

    output = str()
    for i in xrange(m.batch_size):
        for step in xrange(m.num_steps):
            try:
                # Run the batch 
                # targets have to bee set but m is the validation model, thus it should not train the neural network
                cost, state, _, probabilities = session.run([m.cost, m.final_state, eval_op, m.probabilities],
                                                            {m.input_data: x, m.targets: x, m.initial_state: state})

                # Sample a word-id and add it to the matrix and output
                word_id = sample(probabilities[0,:])
                output = output + " " + reader.word_from_id(word_id)
                x[i][step] = word_id

            except ValueError as e:
                print("ValueError")

    print(output)

I have added the variable "probabilities" to the ptb_model and it is simply a softmax over the logits.

self._probabilities = tf.nn.softmax(logits)

And the sampling:

def sample(a, temperature=1.0):
    # helper function to sample an index from a probability array
    a = np.log(a) / temperature
    a = np.exp(a) / np.sum(np.exp(a))
    return np.argmax(np.random.multinomial(1, a, 1))

Answer

Teg Grenager picture Teg Grenager · Aug 2, 2016

I have been working toward the exact same goal, and just got it to work. You have many of the right modifications here, but I think you've missed a few steps.

First, for generating text you need to create a different version of the model which represents only a single timestep. The reason is that we need to sample each output y before we can feed it into the next step of the model. I did this by making a new config which sets num_steps and batch_size both equal to 1.

class SmallGenConfig(object):
  """Small config. for generation"""
  init_scale = 0.1
  learning_rate = 1.0
  max_grad_norm = 5
  num_layers = 2
  num_steps = 1 # this is the main difference
  hidden_size = 200
  max_epoch = 4
  max_max_epoch = 13
  keep_prob = 1.0
  lr_decay = 0.5
  batch_size = 1
  vocab_size = 10000

I also added a probabilities to the model with these lines:

self._output_probs = tf.nn.softmax(logits)

and

@property
def output_probs(self):
  return self._output_probs

Then, there are a few differences in my generate_text() function. The first one is that I load saved model parameters from disk using the tf.train.Saver() object. Note that we do this after instantiating the PTBModel with the new config from above.

def generate_text(train_path, model_path, num_sentences):
  gen_config = SmallGenConfig()

  with tf.Graph().as_default(), tf.Session() as session:
    initializer = tf.random_uniform_initializer(-gen_config.init_scale,
                                                gen_config.init_scale)    
    with tf.variable_scope("model", reuse=None, initializer=initializer):
      m = PTBModel(is_training=False, config=gen_config)

    # Restore variables from disk.
    saver = tf.train.Saver() 
    saver.restore(session, model_path)
    print("Model restored from file " + model_path)

The second difference is that I get the lookup table from ids to word strings (I had to write this function, see the code below).

    words = reader.get_vocab(train_path)

I set up the initial state the same way you do, but then I set up the initial token in a different manner. I want to use the "end of sentence" token so that I'll start my sentence with the right types of words. I looked through the word index and found that <eos> happens to have index 2 (deterministic) so I just hard-coded that in. Finally, I wrap it in a 1x1 Numpy Matrix so that it is the right type for the model inputs.

    state = m.initial_state.eval()
    x = 2 # the id for '<eos>' from the training set
    input = np.matrix([[x]])  # a 2D numpy matrix 

Finally, here's the part where we generate sentences. Note that we tell session.run() to compute the output_probs and the final_state. And we give it the input and the state. In the first iteration the input is <eos> and the state is the initial_state, but on subsequent iterations we give as input our last sampled output, and we pass the state along from the last iteration. Note also that we use the words list to look up the word string from the output index.

    text = ""
    count = 0
    while count < num_sentences:
      output_probs, state = session.run([m.output_probs, m.final_state],
                                   {m.input_data: input,
                                    m.initial_state: state})
      x = sample(output_probs[0], 0.9)
      if words[x]=="<eos>":
        text += ".\n\n"
        count += 1
      else:
        text += " " + words[x]
      # now feed this new word as input into the next iteration
      input = np.matrix([[x]]) 

Then all we have to do is print out the text we accumulated.

    print(text)
  return

That's it for the generate_text() function.

Finally, let me show you the function definition for get_vocab(), which I put in reader.py.

def get_vocab(filename):
  data = _read_words(filename)

  counter = collections.Counter(data)
  count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))

  words, _ = list(zip(*count_pairs))

  return words

The last thing you need to do is to be able to save the model after training it, which looks like

save_path = saver.save(session, "/tmp/model.ckpt")

And that's the model that you'll load from disk later when generating text.

There was one more problem: I found that sometimes the probability distribution produced by the Tensorflow softmax function didn't sum exactly to 1.0. When the sum was larger than 1.0, np.random.multinomial() throws an error. So I had to write my own sampling function, which looks like this

def sample(a, temperature=1.0):
  a = np.log(a) / temperature
  a = np.exp(a) / np.sum(np.exp(a))
  r = random.random() # range: [0,1)
  total = 0.0
  for i in range(len(a)):
    total += a[i]
    if total>r:
      return i
  return len(a)-1 

When you put all this together, the small model was able to generate me some cool sentences. Good luck.