The Twenty-One Project University of Twente

Automatic construction of a bilingual lexicon

Within the Twenty-One project a new statistics based method for the automatic extraction of bilingual lexicons was developed. The method inputs a bilingual corpus (that is: a text and its translation in another language) and outputs a bilingual lexicon. The method has advantages over algorithms that have been published before, because it is based on a symmetric translation model. The resulting bilingual lexicon can be used to translate in both directions between a language pair.


Example results

For the purpose of Cross-Language Information Retrieval a domain specific bilingual lexicon was automatically derived from the aligned Dutch, English, French and German version of Agenda 21. Below you find some prelimanary results.


Agenda 21

The name of the project Twenty-One refers to the United Nations conference on ecology and sustainable development in Rio de Janeiro in 1992. An important result of this conference is a document titled Agenda 21. Agenda 21 captures the application domain of the Twenty-One system and is available in all the major European languages.


Word alignment software

Twente word alignment software version 0.9b can be down loaded from this web site.


Other sites


Last modified January 1999 by Djoerd Hiemstra