| Sign In to gain access to subscriptions and/or personal tools. |
Journal of Information Science 2008;34:688. A more recent version of this article appeared on October 1, 2008
Word segmentation for the Myanmar language
1 Division of Information Studies, School of Communication and Information, Nanyang Technological University, Singapore 637718
* To whom correspondence should be addressed.
This study reports the development of a Myanmar word segmentation method using Unicode standard encoding. Word segmentation is an essential step prior to natural language processing in the Myanmar language, because a Myanmar text is a string of characters without explicit word boundary delimiters. The proposed method has two phases: syllable segmentation and syllable merging. A rule-based heuristic approach was adopted for syllable segmentation, and a dictionary-based statistical approach for syllable merging. Evaluation of test results showed that the method is very effective for the Myanmar language. Key Words: Myanmar language; word segmentation; natural language processing; syllable segmentation; syllable merging; collocation strength; mutual information
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||