Golang : Check if user agent is a robot or crawler example


Tags : golang robot crawler user-agent

Problem:

You need to determine if the user agent that visiting your web server is a bot/robot/crawler. You have tried the hash map solution but found out that it can be easily broken if the robot version string changed. How to create a generic function that can detect if a user agent is a robot?

Solution:

Ported this solution from CodeIgniter for my own use. Feel free to adapt it for your own use.

Here you go!

  package main

  import (
 "fmt"
 "net/http"
 "strings"
  )

  func is_robot(useragent string) bool {
 // There are hundreds of bots but these are the most common.
 // You can see other bots list at
 // http://www.botsvsbrowsers.com/category/1/index.html

 // the list below is taken from
 // https://github.com/bcit-ci/CodeIgniter/blob/develop/system/libraries/User_agent.php
 // Hash map/table method requires exact match of the user agent string and can be easily broken
 // if the version number change. Therefore, it is better to check the user agent against a slice/dictionary

 robots := []string{"Googlebot", "Google Page Speed Insights", "MSNBot", "Baiduspider", "Bing", "DuckDuckBot", "Inktomi Slurp", "Yahoo", "Ask Jeeves", "FastCrawler", "YandexBot", "MediaPartners Google", "Crazy Webcrawler", "AdsBot Google", "Feedfetcher Google", "Curious George", "facebookexternalhit"}

 for _, bot := range robots {
 if strings.Index(useragent, bot) > -1 {
 return true
 }
 }
 return false
  }

  func checkIfUserAgentIsRobot(w http.ResponseWriter, r *http.Request) {
 ua := r.Header.Get("User-Agent")

 fmt.Printf("user agent is: %s \n", ua)
 w.Write([]byte("user agent is " + ua + "\n"))

 result := "no"

 if is_robot(ua) {
 result = "yes"
 }

 fmt.Printf("user agent is a robot: %v \n", is_robot(ua))
 w.Write([]byte("user agent is a robot:" + result + "\n"))

  }

  func main() {
 http.HandleFunc("/", checkIfUserAgentIsRobot)
 http.ListenAndServe(":8080", nil)
  }

Output:

Browse page with Chrome browser

user agent is: Mozilla/5.0 (Macintosh; Intel Mac OS X 1085) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

user agent is a robot: false

user agent is: Mozilla/5.0 (Macintosh; Intel Mac OS X 1085) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.116 Safari/537.36

user agent is a robot: false

Browse page with Google Page Speed Insights bot

user agent is: Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Version/8.0 Mobile/12F70 Safari/600.1.4

user agent is a robot: true

user agent is: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/27.0.1453 Safari/537.36

user agent is a robot: true

References:

https://www.socketloop.com/tutorials/golang-check-if-a-string-contains-multiple-sub-strings-in-string

https://www.socketloop.com/tutorials/golang-how-to-determine-if-request-or-crawl-is-from-google-robots

https://www.socketloop.com/tutorials/golang-check-if-item-is-in-slice-array

  See also : Golang : How to determine if request or crawl is from Google robots



Tags : golang robot crawler user-agent

By Adam Ng

IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.


Advertisement