Golang : Example of how to detect which type of script a word belongs to
In one of my projects, I need to detect if a word that cannot be translated is of Arabic or Latin script type. In order to detect a word's script type, I use the unicode.In()
function to match every single character inside the word against a dictionary.
Golang's unicode
package provides the dictionary that I needed and it is fairly easy to do so with a for loop.
Here you go!
package main
import (
"fmt"
"github.com/chrisport/go-lang-detector/langdet/langdetdef"
"unicode"
)
func main() {
// "love" translated to different languages
input := []string{"爱", "好き", "ආදරය", "cinta", "حب", "مينه", "ฮัก", "माया"}
for k, v := range input {
if checkifLatin(v) {
fmt.Println(k, "["+v+"] is a Latin or a derivative of Latin ? ", checkifLatin(v))
} else if checkifArabic(v) {
fmt.Println(k, "["+v+"] is an Arabic or a derivative of Arabic ? ", checkifArabic(v))
} else if checkifThai(v) {
fmt.Println(k, "["+v+"] is a Thai or a derivative of Thai ? ", checkifThai(v))
} else if checkifHan(v) {
fmt.Println(k, "["+v+"] is a Han(Chinese) or a derivative of Han(Chinese) ? ", checkifHan(v))
} else if checkifDevanagari(v) {
fmt.Println(k, "["+v+"] is a Devanagari or a derivative of Devanagari ? ", checkifDevanagari(v))
} else {
// throw in an unknown script Japanese and Sinhala just to test out. To detect, use unicode.Sinhala and unicode.Hiragana
fmt.Println(k, "["+v+"] unable to detect script type. Need new function to detect this script type. ")
}
}
fmt.Println("----------------------------------------------------------------------------")
// alternative method, but....it is not what I want.
detector := langdetdef.NewWithDefaultLanguages()
result := detector.GetClosestLanguage(input[4])
fmt.Println("GetClosestLanguage returns : ", result)
}
// checkifLatin - check if input is Latin scripts or not.
func checkifLatin(input string) bool {
var isLatin = false
for _, v := range input {
if unicode.In(v, unicode.Latin) {
isLatin = true
} else {
isLatin = false
}
}
return isLatin
}
// checkifThai - check if input is Thai scripts or not.
func checkifThai(input string) bool {
var isThai = false
for _, v := range input {
if unicode.In(v, unicode.Thai) {
isThai = true
} else {
isThai = false
}
}
return isThai
}
// checkifArabic is Arabic characters or not.
func checkifArabic(input string) bool {
var isArabic = false
for _, v := range input {
if unicode.In(v, unicode.Arabic) {
isArabic = true
} else {
isArabic = false
}
}
return isArabic
}
// checkifHan is Han(Chinese) characters or not.
func checkifHan(input string) bool {
var isHan = false
for _, v := range input {
if unicode.In(v, unicode.Han) {
isHan = true
} else {
isHan = false
}
}
return isHan
}
// checkifDevanagari - check if input is Devanagari scripts or not.
func checkifDevanagari(input string) bool {
var isDevanagari = false
for _, v := range input {
if unicode.In(v, unicode.Devanagari) {
isDevanagari = true
} else {
isDevanagari = false
}
}
return isDevanagari
}
Output:
0 [爱] is a Han(Chinese) or a derivative of Han(Chinese) ? true
1 [好き] unable to detect script type. Need new function to detect this script type.
2 [ආදරය] unable to detect script type. Need new function to detect this script type.
3 [cinta] is a Latin or a derivative of Latin ? true
4 [حب] is an Arabic or a derivative of Arabic ? true
5 [مينه] is an Arabic or a derivative of Arabic ? true
6 [ฮัก] is a Thai or a derivative of Thai ? true
7 [माया] is a Devanagari or a derivative of Devanagari ? true
----------------------------------------------------------------------------
GetClosestLanguage returns : undefined
Reference :
See also : Golang : Converting individual Jawi alphabet to Rumi(Romanized) alphabet example
By Adam Ng(黃武俊)
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+10.8k Golang : Underscore string example
+12.4k Golang : Encrypt and decrypt data with x509 crypto
+9.1k Golang : How to use Gorilla webtoolkit context package properly
+15.9k Golang : How to login and logout with JWT example
+6.3k Golang : Get Hokkien(福建话)/Min-nan(閩南語) Pronounciations
+14.7k Golang : GUI with Qt and OpenCV to capture image from camera
+9.4k Golang : How to protect your source code from client, hosting company or hacker?
+19.5k Golang : How to count the number of repeated characters in a string?
+19.6k Golang : How to Set or Add Header http.ResponseWriter?
+22.8k Golang : Set and Get HTTP request headers example
+11.6k Golang : Generate DSA private, public key and PEM files example
+5.9k Golang : Fix opencv.LoadHaarClassifierCascade The node does not represent a user object error