Golang : Simple word filter or censor example




A reader wanted to know how to build a simple bad/curse words filter or censor function without relying on Redis. He found an example that used Redis, but the list of bad words dataset is missing and he prefers not to use solution that requires connecting to another server to check input string for bad words.

Below is my own solution that uses a bad word slice dataset to check the input string for bad words. It should be good enough to filter out most of the bad words and kinda pretty fast too when executed.

 package main

 import (
 "fmt"
 "strings"
 )

 func CensorWord(str string, censored []string) string {

 var newSlice []string

 // check for empty slice
 if len(censored) <= 0 {
 return str
 }

 // convert str into a slice
 strSlice := strings.Fields(str)

 //check each words in strSlice against censored slice
 for position, word := range strSlice {
 for _, forbiddenWord := range censored {

 // NOTE : change between Index and EqualFold to see the different result

 if test := strings.Index(strings.ToLower(word), forbiddenWord); test > -1 {
 //if test := strings.EqualFold(strings.ToLower(word), forbiddenWord); test == true {

 // calculate how many # for replacement
 replacement := strings.Repeat("#", len(word))

 strSlice[position] = replacement
 newSlice = append(strSlice[:position], strSlice[position:]...)
 }
 }
 }

 // convert []string slice back to string
 return strings.Join(newSlice, " ")

 }

 func main() {
 // unable to catch a*wesome because it is not in the notAllowed dataset

 var notAllowed = []string{"duck", "awesome", "shit"}
 var inputStr = "Ducking duck! POOP is just another a*wesome or awesome word....for shit."
 result := CensorWord(inputStr, notAllowed)

 fmt.Println("[Original : ]", inputStr)
 fmt.Println("[Censored : ]", result)

 // to catch a*wesome, we need to include the word wesome in the dataset

 var notAllowed1 = []string{"wesome", "duck", "awesome", "shit"}
 var inputStr1 = "Sh!t duck! POOP is just another a*wesome or awesome word....for shit."
 result1 := CensorWord(inputStr1, notAllowed1)

 fmt.Println("[Original : ]", inputStr1)
 fmt.Println("[Censored : ]", result1)

 fmt.Println("Notice that the word [Sh!t] is not censored")
 fmt.Println("This code example here is not a bullet proof solution!")
 fmt.Println("Censoring is as good as your given dataset of words!")
 }

Output:

[Original : ] Ducking duck! POOP is just another a*wesome or awesome word....for shit.

[Censored : ] ####### ##### POOP is just another a*wesome or ####### word....for #####

[Original : ] Sh!t duck! POOP is just another a*wesome or awesome word....for shit.

[Censored : ] Sh!t ##### POOP is just another ######## or ####### word....for #####

Notice that the word [Sh!t] is not censored

This code example here is not a bullet proof solution!

Censoring is as good as your given dataset of words!

The average speed of execution is about 70 to 90 µs(microseconds) on my own Mac. See example below for running against a larger dataset and longer input string.

WARNING : Example below contains profanities that you may get offended

 package main

 import (
 "fmt"
 "strings"
 "time"
 )

 func CensorWord(str string, censored []string) string {

 var newSlice []string

 // check for empty slice
 if len(censored) <= 0 {
 return str
 }

 // convert str into a slice
 strSlice := strings.Fields(str)

 //check each words in strSlice against censored slice
 for position, word := range strSlice {
 for _, forbiddenWord := range censored {

 // NOTE : change between Index and EqualFold to see the different result

 if test := strings.Index(strings.ToLower(word), forbiddenWord); test > -1 {
 //if test := strings.EqualFold(strings.ToLower(word), forbiddenWord); test == true {

 // calculate how many # for replacement
 replacement := strings.Repeat("#", len(word))

 strSlice[position] = replacement
 newSlice = append(strSlice[:position], strSlice[position:]...)
 }
 }
 }

 // convert []string slice back to string
 return strings.Join(newSlice, " ")

 }

 func main() {
 // now, let run against a big list of bad words
 // take from https://gist.github.com/jamiew/1112488

 var badwords = []string{"4r5e",
 "5h1t",
 "5hit",
 "a55",
 "anal",
 "anus",
 "ar5e",
 "arrse",
 "arse",
 "ass",
 "ass-fucker",
 "asses",
 "assfucker",
 "assfukka",
 "asshole",
 "assholes",
 "asswhole",
 "a_s_s",
 "b!tch",
 "b00bs",
 "b17ch",
 "b1tch",
 "ballbag",
 "balls",
 "ballsack",
 "bastard",
 "beastial",
 "beastiality",
 "bellend",
 "bestial",
 "bestiality",
 "bi+ch",
 "biatch",
 "bitch",
 "bitcher",
 "bitchers",
 "bitches",
 "bitchin",
 "bitching",
 "bloody",
 "blow job",
 "blowjob",
 "blowjobs",
 "boiolas",
 "bollock",
 "bollok",
 "boner",
 "boob",
 "boobs",
 "booobs",
 "boooobs",
 "booooobs",
 "booooooobs",
 "breasts",
 "buceta",
 "bugger",
 "bum",
 "bunny fucker",
 "butt",
 "butthole",
 "buttmuch",
 "buttplug",
 "c0ck",
 "c0cksucker",
 "carpet muncher",
 "cawk",
 "chink",
 "cipa",
 "cl1t",
 "clit",
 "clitoris",
 "clits",
 "cnut",
 "cock",
 "cock-sucker",
 "cockface",
 "cockhead",
 "cockmunch",
 "cockmuncher",
 "cocks",
 "cocksuck",
 "cocksucked",
 "cocksucker",
 "cocksucking",
 "cocksucks",
 "cocksuka",
 "cocksukka",
 "cok",
 "cokmuncher",
 "coksucka",
 "coon",
 "cox",
 "crap",
 "cum",
 "cummer",
 "cumming",
 "cums",
 "cumshot",
 "cunilingus",
 "cunillingus",
 "cunnilingus",
 "cunt",
 "cuntlicking",
 "cunts",
 "cyalis",
 "cyberfuc",
 "cyberfuck",
 "cyberfucked",
 "cyberfucker",
 "cyberfuckers",
 "cyberfucking",
 "dildo",
 "dildos",
 "dink",
 "dinks",
 "dirsa",
 "dlck",
 "dog-fucker",
 "doggin",
 "dogging",
 "donkeyribber",
 "doosh",
 "duche",
 "dyke",
 "ejaculate",
 "ejaculated",
 "ejaculates",
 "ejaculatings",
 "ejaculation",
 "ejakulate",
 "f u c k",
 "f u c k e r",
 "f4nny",
 "fag",
 "fagging",
 "faggitt",
 "faggot",
 "faggs",
 "fagot",
 "fagots",
 "fags",
 "fanny",
 "fannyflaps",
 "fannyfucker",
 "fanyy",
 "fatass",
 "fcuk",
 "fcuker",
 "fcuking",
 "feck",
 "fecker",
 "felching",
 "fellate",
 "fellatio",
 "fingerfuck",
 "fingerfucked",
 "fingerfucker",
 "fingerfuckers",
 "fingerfucking",
 "fingerfucks",
 "fistfuck",
 "fistfucked",
 "fistfucker",
 "fistfuckers",
 "fistfucking",
 "fistfuckings",
 "fistfucks",
 "flange",
 "fook",
 "fooker",
 "fuck",
 "fucka",
 "fucked",
 "fucker",
 "fuckers",
 "fuckhead",
 "fuckheads",
 "fuckin",
 "fucking",
 "fucking",
 "fuckingshitmotherfucker",
 "fuckme",
 "fucks",
 "fuckwhit",
 "fuckwit",
 "fudge packer",
 "fudgepacker",
 "fuk",
 "fuker",
 "fukker",
 "fukkin",
 "fuks",
 "fukwhit",
 "fukwit",
 "fux",
 "fux0r",
 "f_u_c_k",
 "gangbanged",
 "gangbangs",
 "gaylord",
 "gaysex",
 "goatse",
 "god-dam",
 "god-damned",
 "goddamn",
 "goddamned",
 "hardcoresex",
 "hell",
 "heshe",
 "hoar",
 "hoare",
 "hoer",
 "homo",
 "hore",
 "horniest",
 "horny",
 "hotsex",
 "jack-off",
 "jackoff",
 "jap",
 "jerk-off",
 "jism",
 "jiz",
 "jizm",
 "jizz",
 "kawk",
 "knob",
 "knobead",
 "knobed",
 "knobend",
 "knobhead",
 "knobjocky",
 "knobjokey",
 "kock",
 "kondum",
 "kondums",
 "kum",
 "kummer",
 "kumming",
 "kums",
 "kunilingus",
 "l3i+ch",
 "l3itch",
 "labia",
 "lmfao",
 "lust",
 "lusting",
 "m0f0",
 "m0f",
 "m45terbate",
 "ma5terb8",
 "ma5terbate",
 "masochist",
 "master-bate",
 "masterb8",
 "masterbat*",
 "masterbat3",
 "masterbate",
 "masterbation",
 "masterbations",
 "masturbate",
 "mo-fo",
 "mof0",
 "mofo",
 "mothafuck",
 "mothafucka",
 "mothafuckas",
 "mothafuckaz",
 "mothafucked",
 "mothafucker",
 "mothafuckers",
 "mothafuckin",
 "mothafucking",
 "mothafuckings",
 "mothafucks",
 "mother fucker",
 "motherfuck",
 "motherfucked",
 "motherfucker",
 "motherfuckers",
 "motherfuckin",
 "motherfucking",
 "motherfuckings",
 "motherfuckka",
 "motherfucks",
 "muff",
 "muth",
 "muthafecker",
 "muthafuckker",
 "muther",
 "mutherfucker",
 "n1gga",
 "n1gger",
 "nazi",
 "nigg3r",
 "nigg4h",
 "nigga",
 "niggah",
 "niggas",
 "niggaz",
 "nigger",
 "niggers",
 "nob",
 "nob jokey",
 "nobhead",
 "nobjocky",
 "nobjokey",
 "numbnuts",
 "nutsack",
 "orgasim",
 "orgasims",
 "orgasm",
 "orgasms",
 "p0rn",
 "pawn",
 "pecker",
 "penis",
 "penisfucker",
 "phonesex",
 "phuck",
 "phuk",
 "phuked",
 "phuking",
 "phukked",
 "phukking",
 "phuks",
 "phuq",
 "pigfucker",
 "pimpis",
 "piss",
 "pissed",
 "pisser",
 "pissers",
 "pisses",
 "pissflaps",
 "pissin",
 "pissing",
 "pissoff",
 "poop",
 "porn",
 "porno",
 "pornography",
 "pornos",
 "prick",
 "pricks",
 "pron",
 "pube",
 "pusse",
 "pussi",
 "pussies",
 "pussy",
 "pussys",
 "rectum",
 "retard",
 "rimjaw",
 "rimming",
 "s hit",
 "s.o.b.",
 "sadist",
 "schlong",
 "screwing",
 "scroat",
 "scrote",
 "scrotum",
 "semen",
 "sex",
 "sh!+",
 "sh!t",
 "sh1t",
 "shag",
 "shagger",
 "shaggin",
 "shagging",
 "shemale",
 "shi+",
 "shit",
 "shitdick",
 "shite",
 "shited",
 "shitey",
 "shitfuck",
 "shitfull",
 "shithead",
 "shiting",
 "shitings",
 "shits",
 "shitted",
 "shitter",
 "shitters",
 "shitting",
 "shittings",
 "shitty",
 "skank",
 "slut",
 "sluts",
 "smegma",
 "smut",
 "snatch",
 "son-of-a-bitch",
 "spac",
 "spunk",
 "s_h_i_t",
 "t1tt1e5",
 "t1ttie",
 "teets",
 "teez",
 "testical",
 "testicle",
 "tit",
 "titfuck",
 "tits",
 "titt",
 "tittie5",
 "tittiefucker",
 "titties",
 "tittyfuck",
 "tittywank",
 "titwank",
 "tosser",
 "turd",
 "tw4t",
 "twat",
 "twathead",
 "twatty",
 "twunt",
 "twunter",
 "v14gra",
 "v1gra",
 "vagina",
 "viagra",
 "vulva",
 "w00se",
 "wang",
 "wank",
 "wanker",
 "wanky",
 "whoar",
 "whore",
 "willies",
 "willy",
 "xrated",
 "xxx"}

 //fmt.Println(badwords)

 var inputString = " A bitch rare breed these days, a titwanker language purist is a prick software developer that choose to write their program in a single programming language and stick to traditional rules of programming. They prefer to write their own routines and refuse to use whateverfuck frameworks or third-party libraries contaminated by other programming languages. Bunch of mothafuckas! Unlike the old days, where a developer can write a complete DOS or Unix application with Pascal or C language without involving SQL, CSS or HTML. As users move toward web and mobile apps, it is hard to develop something useful in today's programming environment with a single language. Most of these fucking language purists are known to develop program for their own use as a hobby. f*cking s.o.b."

 resultString := CensorWord(inputString, badwords)

 startTime := time.Now()
 fmt.Println("[Original : ]", inputString)
 fmt.Println("-------------------------------------------")
 fmt.Println("[Censored : ]", resultString)
 endTime := time.Now()

 fmt.Printf("\n\n\nTime taken is about ------->>> %v \n", endTime.Sub(startTime))

 }

[Original : ] A bitch rare breed these days, a titwanker language purist is a prick software developer that choose to write their program in a single programming language and stick to traditional rules of programming. They prefer to write their own routines and refuse to use whateverfuck frameworks or third-party libraries contaminated by other programming languages. Bunch of mothafuckas! Unlike the old days, where a developer can write a complete DOS or Unix application with Pascal or C language without involving SQL, CSS or HTML. As users move toward web and mobile apps, it is hard to develop something useful in today's programming environment with a single language. Most of these fucking language purists are known to develop program for their own use as a hobby. f*cking s.o.b.


[Censored : ] A ##### rare breed these days, a ######### language purist is a ##### software developer that choose to write their program in a single programming language and stick to traditional rules of programming. They prefer to write their own routines and refuse to use ############ frameworks or third-party libraries contaminated by other programming languages. Bunch of ############ Unlike the old days, where a developer can write a complete DOS or Unix application with Pascal or C language without involving SQL, CSS or HTML. As users move toward web and mobile apps, it is hard to develop something useful in today's programming environment with a single language. Most of these ####### language purists are known to develop program for their own use as a hobby. f*cking ######


Time taken is about ------->>> 81.375µs

Hope this helps!

References:

https://golang.org/pkg/strings/#Repeat

https://socketloop.com/tutorials/golang-check-if-a-string-contains-multiple-sub-strings-in-string

https://www.socketloop.com/tutorials/golang-convert-string-to-array-slice

https://www.socketloop.com/tutorials/golang-removes-punctuation-or-defined-delimiter-from-the-user-s-input

  See also : Golang : Check if a string contains multiple sub-strings in []string?





By Adam Ng

IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.