What is the difference between Luong attention and Bahdanau attention?

tensorflow deep-learning nlp attention-model

Shamane Siriwardhana · May 29, 2017 · Viewed 26.1k times · Source

These two attentions are used in seq2seq modules. The two different attentions are introduced as multiplicative and additive attentions in this TensorFlow documentation. What is the difference?

Answer

I went through this Effective Approaches to Attention-based Neural Machine Translation. In the section 3.1 They have mentioned the difference between two attentions as follows,

Luong attention used top hidden layer states in both of encoder and decoder. But Bahdanau attention take concatenation of forward and backward source hidden state (Top Hidden Layer).
In Luong attention they get the decoder hidden state at time t. Then calculate attention scores and from that get the context vector which will be concatenated with hidden state of the decoder and then predict.

But in the Bahdanau at time t we consider about t-1 hidden state of the decoder. Then we calculate alignment , context vectors as above. But then we concatenate this context with hidden state of the decoder at t-1. So before the softmax this concatenated vector goes inside a GRU.
Luong has diffferent types of alignments. Bahdanau has only concat score alignment model.

What is the difference between Luong attention and Bahdanau attention?

Answer

Related questions