Start Submission Become a Reviewer

Reading: Selection of Korean Proper Translation Words Using Bi-Gram-Based Histograms

Download

A- A+
dyslexia friendly

Research Papers

Selection of Korean Proper Translation Words Using Bi-Gram-Based Histograms

Authors:

Hanmin Jung ,

Information System Division, KISTI, Korea
X close

Hee-Kwan Koo,

Practical Information Science, UST, Korea
X close

Won-Kyung Sung,

Information System Division, KISTI, Korea
X close

Dong-In Park

Information System Division, KISTI, Korea
X close

Abstract

This paper describes a proper translation-selecting and translation-clustering algorithm for Korean translation of words automatically extracted from newspapers. As about 80% of the English words in Korean newspapers appear in abbreviated form, it is necessary to make clusters of translation words to construct easily bilingual knowledge bases such as dictionaries and translation patterns. As a seed to acquiring a translation cluster, we selected a proper translation word from a given translation set using bi-gram-based histograms. Translation words that share bi-grams with the chosen proper translation word are assigned to the cluster for the proper word. The given translation set then picks out the translation words of the cluster. These processes continue until the translation set becomes empty. Experimental results show that our algorithms are superior to bi-gram-based binary vectors including Dice coefficient and Jaccard coefficient in selecting the proper translation word for each translation cluster.
DOI: http://doi.org/10.2481/dsj.6.S125
How to Cite: Jung, H. et al., (2007). Selection of Korean Proper Translation Words Using Bi-Gram-Based Histograms. Data Science Journal. 6, pp.S125–S131. DOI: http://doi.org/10.2481/dsj.6.S125
1
Views
4
Downloads
Published on 28 Mar 2007.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus