Embedding layer in neural machine translation with attention

According to the PyTorch docs:

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

In short, nn.Embedding embeds a sequence of vocabulary indices into a new embedding space. You can indeed roughly understand this as a word2vec style mechanism.

As a dummy example, let’s create an embedding layer that takes as input a total of 10 vocabularies (i.e. the input data only contains a total of 10 unique tokens), and returns embedded word vectors living in 5-dimensional space. In other words, each word is represented as 5-dimensional vectors. The dummy data is a sequence of 3 words with indices 1, 2, and 3, in that order.

>>> embedding = nn.Embedding(10, 5)
>>> embedding(torch.tensor([1, 2, 3]))
tensor([[-0.7077, -1.0708, -0.9729,  0.5726,  1.0309],
        [ 0.2056, -1.3278,  0.6368, -1.9261,  1.0972],
        [ 0.8409, -0.5524, -0.1357,  0.6838,  3.0991]],
       grad_fn=<EmbeddingBackward>)

You can see that each of the three words are now represented as 5-dimensional vectors. We also see that there is a grad_fn function, which means that the weights of this layer will be adjusted through backprop. This answers your question of whether embedding layers are trainable: the answer is yes. And indeed this is the whole point of embedding: we expect the embedding layer to learn meaningful representations, the famous example of king - man = queen being the classic example of what these embedding layers can learn.


Edit

The embedding layer is, as the documentation states, a simple lookup table from a matrix. You can see this by doing

>>> embedding.weight
Parameter containing:
tensor([[-1.1728, -0.1023,  0.2489, -1.6098,  1.0426],
        [-0.7077, -1.0708, -0.9729,  0.5726,  1.0309],
        [ 0.2056, -1.3278,  0.6368, -1.9261,  1.0972],
        [ 0.8409, -0.5524, -0.1357,  0.6838,  3.0991],
        [-0.4569, -1.9014, -0.0758, -0.6069, -1.2985],
        [ 0.4545,  0.3246, -0.7277,  0.7236, -0.8096],
        [ 1.2569,  1.2437, -1.0229, -0.2101, -0.2963],
        [-0.3394, -0.8099,  1.4016, -0.8018,  0.0156],
        [ 0.3253, -0.1863,  0.5746, -0.0672,  0.7865],
        [ 0.0176,  0.7090, -0.7630, -0.6564,  1.5690]], requires_grad=True)

You will see that the first, second, and third rows of this matrix corresponds to the result that was returned in the example above. In other words, for a vocabulary whose index is n, the embedding layer will simply “lookup” the nth row in its weights matrix and return that row vector; hence the lookup table.

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top