Golang : Customize scanner.Scanner to treat dash as part of identifier

Tags : golang scanner is-ident-rune unicode parser

Putting this down here for my own future reference. Ok, the problem that I'm solving today involved using the text/scanner package to parse a given input with strings such as beli-belah, buah-buahan and jalan-jalan.

The initial problem is that scanner.Scanner will break buah-buahan to buah and buahan.

So, how to customize the scanner to treat - dash as part of the identifier?

Simple, use .IsIdentRune method to override the default behavior of the scanner.

For example,

  var scn scanner.Scanner
  scn.Whitespace ^= 1<<'\t' | 1<<'\n' | 1<<'\r' | 1<<' ' // don't skip tabs and new lines

  // treat leading '-' as part of an identifier ... for word such as buah-buahan, biri-biri
  scn.IsIdentRune = func(ch rune, i int) bool {
  return ch == '-' && i == 0 || unicode.IsLetter(ch) || unicode.IsDigit(ch) && i > 0 || unicode.IsPunct(ch)

If you encounter the same problem as I am, hope this helps!

Reference :


  See also : Golang : How to tokenize source code with text/scanner package?

Tags : golang scanner is-ident-rune unicode parser

By Adam Ng(黃武俊)

IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.