| Sign In to gain access to subscriptions and/or personal tools. |
Word segmentation for the Myanmar languageDivision of Information Studies, School of Communication and Information, Nanyang Technological University, Singapore 637718
Division of Information Studies, School of Communication and Information, Nanyang Technological University, Singapore 637718, tjcna{at}ntu.edu.sg
Myanmar NLP Research Center, Hlaing, Yangon, Myanmar This study reports the development of a Myanmar word segmentation method using Unicode standard encoding. Word segmentation is an essential step prior to natural language processing in the Myanmar language, because a Myanmar text is a string of characters without explicit word boundary delimiters. The proposed method has two phases: syllable segmentation and syllable merging. A rule-based heuristic approach was adopted for syllable segmentation, and a dictionary-based statistical approach for syllable merging. Evaluation of test results showed that the method is very effective for the Myanmar language.
Key Words: Myanmar language word segmentation natural language processing syllable segmentation syllable merging collocation strength mutual information
This version was published on October
1, 2008 Journal of Information Science, Vol. 34, No. 5,
688-704 (2008) |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||