IBM Journal of research and Development, 38(2):183-193. Algorithms for Arabic name transliteration.
Arabic Proper Names Dictionary from NMSU.
Proceedings of the ACL conference workshop on computational approaches to Semitic languages, 2002. Machine translation of names in Arabic text. We find that transliteration either of OOV named entities or of all OOV words is an effective approach for cross language IR. We also explore the effectiveness of these systems on the TREC 2002 cross language IR task. We evaluate the statistically-trained model and a simpler hand-crafted model on a test set of named entities from the Arabic AFP corpus and demonstrate that they perform better than two online translation sources. This technique requires no heuristics or linguistic knowledge of either language. We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage learns the translation model over this inventory. In the present study, we present a simple statistical technique to train an English to Arabic transliteration model from pairs of names. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. Out of vocabulary (OOV) words are problematic for cross language information retrieval.