Research on Machine Translation at NEUNLPLab
Our research on machine translation started from 1980 when our lab is founded. In the first stage of our research, we focused on rule-based machine translation, example-based machine translation, pattern-based machine translation and multi-lingual machine translation. From 2007, we started our work on statistical machine translation (SMT). Our current work in MT is mainly focused on syntax-based MT, document-level MT and MT platform development. The current projects include,
- String-to-tree/Tree-to-string MT
- Chinese parsing for Chinese-to-English MT
- Learning Better Target-language Syntax Trees for String-to-tree MT
- Consensus-based Decoding for Document-level Translation
- Construction of SMT platform - NEUTrans
We have developed a phrase-based SMT system which is a part of the NEUTrans project. This system achieved the second place in CWMT09 Chinese-to-English single-system MT track (18 participants).
MT Platform Development
Our long-term goal is to build the following two MT platforms.

Better Understanding for Machine Translation
In this project we aim at learning better MT models from deep understanding or analysis of languages. E.g. integrating fine-grained target-side syntax trees in an end-to-end MT system. We believe that MT systems can definitely make benefits from the better understanding of languages, such as better syntactic parsing. Currently we focus on the following two research topics.- Better Parsing for Syntax-based MT. Most of the current syntax-based MT systems depend on parsers designed for the monolingual parsing task, such as the English parsers built from PTB style parse trees. However, it has no answers that what kind of parse trees is more useful for the MT task. Addressing this problem, we study the issue of learning better tree forms for MT rather monolingual parsing.
- Learning Better Target-language Parse Trees for String-to-tree MT. In string-to-tree MT, translations are derived from the sequence of terminals in the target-side parse trees generated during decoding. From the view of parsing, it is reasonable to believe that the high quality target-side parse trees can lead to relatively better translations. Motivated by this idea, we investigate methods to learn better target-side parse trees for string-to-tree MT decoding.
Document-level Translation
In real-world applications, there is great demand for automatic translation of documents. However, it is very difficult to solve some problems, such as translation of omission in the source-side language, in the current sentence-level MT framework due to the lack of document context. In this project we address this problem. As an initial attempt, three issues are studied first.- Translation of omission. Omission is a common phenomenon in some languages such as Chinese, but seldom appears in English. In this study, we investigate various methods to recover and translate the omission parts in the source-side language for Chinese-English translation.
- Translation of abbreviation. In some cases, the abbreviations are used as substitutions for the original full names after the first use in a document. To translate abbreviation, a key problem is to find the mapping between the abbreviations and the corresponding full names. We are currently investigating methods for incorporating the document information into modeling this problem.
- Translation consistence. The translations of the same terminology should be consistent in document translation. This requirement is even stronger in the domain specific translation tasks, such as patent translation. We are studying the document-level consensus-based decoding to address this problem.
Selected Publications (here is the full list)
[1] Tong Xiao, Jingbo Zhu, Hao Zhang and Muhua Zhu. 2010. An Empirical Study of Translation Rule Extraction with Multiple Parsers. To appear in Proc. of COLING 2010.
[2] Tong Xiao, Jingbo Zhu, Muhua Zhu and Huizhen Wang. 2010. Boosting-based System Combination for Machine Tranlsation. In Proc. of ACL, 2010, Uppsala, Sweden. [pdf]
[3] Tong Xiao, Mu Li, Dongdong Zhang, Jingbo Zhu and Ming Zhou. 2009. Better Synchronous Binarization for Machine Translation. In Proc. of Empirical Methods in Natural Language Processing (EMNLP) 2009, Singapore. [pdf]