NEMATREH - Neural Machine Translation for English to Hindi in Tourism Domain

A Dissertation for the completetion of MSc in Computer Science under the supervision of Dr. Kevin Koidl

Abstract from the Disseration:

Hindi is one of the major languages of the world, which is spoken by over 700 million people around the world. In India, more than 550 million people speak Hindi, whereas the English speakers are limited to around 12%. In this modern digital age, most of the digital content that is produced in India and the rest of the World is predominately in English. This leaves a considerable amount of Indian population to be digitally neglected as there are limited digital content in their native language or the most widely used language Hindi. This results in an immense potential for the English to Hindi neural machine translation by enabling automatic translations for millions of existing digital English articles. The recent development of deep neural networks enabled machine translation to create translations that are close to human generated translations. There is an immense opportunity to explore the English to Hindi Neural Machine Translation and the domain adaptation of neural translation model by generating domain specific translations. In this research, the tourism domain was chosen based on considerable lack of translated content in this domain. To overcome this shortcoming this work creates a English-Hindi Neural Machine Translation Model which generate above average translations. Furthermore, domain Specific Fine-tuning of the pre-trained model is conducted In order to extend the domain adaptability of the model. All models are tested and evaluated on domain specific data using the BLEU scores. This dissertation report presents the fundamentals of neural machine translation and draws conclusions about their domain specific adaptability.