![monolingual corpora monolingual corpora](https://images.deepai.org/publication-preview/ai4bharat-indicnlp-corpus-monolingual-corpora-and-word-embeddings-for-indic-languages-page-1-medium.jpg)
The loss function will try to reduce the difference between X and X_hat. Take the generated Spanish, corrupt it then encode it (l2 encode) and feed into M again to generate X_hat. Sample a proper English sentence (X), encode it (l1 encode) and feed into model M to generate Spanish. Take note of the difference in l1 and l2 in equation 2. x_hat is produced by feeding the corrupted y into M (x_hat ~ d(e(C(M(x)),l2),l1)). From the equation 2, we can see that the loss function calculates the sum of token-level cross-entropy losses between x and x_hat.
#Monolingual corpora full#
![monolingual corpora monolingual corpora](https://i1.rgstatic.net/publication/327389938_Anatomy_of_Preprocessing_of_Big_Data_for_Monolingual_Corpora_Paraphrase_Extraction_Source_Language_Sentence_Selection_Proceedings_of_IEMIS_2018_Volume_3/links/5bd6fe31a6fdcc3a8dadd53c/largepreview.png)
The attention weights are also shared between the encoder and decoder. As mentioned earlier, both source and target language share the same encoder, same goes to the decoder. The decoder which is a LSTM takes in previous hidden states, current word and a context vector given by a weighted sum over the encoder states. The encoder is a bidirectional LSTM which returns a sequence of hidden states. Input feeding is an approach to feed attentional vectors “ as inputs to the next time steps to inform the model about past alignment decisions” ( Source)
![monolingual corpora monolingual corpora](https://i1.rgstatic.net/publication/271601807_The_Role_of_Large_Monolingual_Corpora_in_Improving_Machine_Translation_Quality/links/54cdd98c0cf24601c08e4094/largepreview.png)
Sequence-to-sequence model with attention without input feeding. Decoder takes in Z and language l to generate words in language l. The decoder is language independent.Įncoder takes in W and generate Z. (Only 1 encoder for both languages)ĭecoder -> decode from the latent space to source and target sentences. This is pretty similar to the working mechanism of a GAN).Įncoder -> encode source and target sentences to latent space. The source and target sentence latent representations are constrained to have the same distribution using an adversarial regularisation term (model tries to fool the discriminator which is simultaneously trained to identify the language of a given latent representation.The model as to be able to work with noisy translation (From source to target language and vice versa).Build a common latent space between the two languages/domains (e.g English and French) and learn to translate by reconstructing in both domains.To train a general machine translation system without supervision using only monolingual corpus for each language. Parallel corpora dataset is not available for low-resource languages.It requires a lot manpower and specialised expertise. Parallel corpora datasets are costly to build.This post introduce the unsupervised machine translation model developed by Facebook. However, NMT model is very hard to train. A good NMT model can efficiently and accurately translate a sentence from one language to another. It allows people who speak different languages to communicate effectively with each other. Neural Machine Translation (NMT) is very important in today’s word.