Golang : read gzipped http response
There are times when we want to grab a website content for parsing(crawling) and found out that the content is gzipped.
Normally, to deal with gzipped HTML reply, you can use Exec package to execute curl
from command line and pipe the gzipped content to gunzip, such as this :
curl -H "Accept-Encoding: gzip" http://www.thestar.com.my | gunzip
Another way to process gzipped http response can be done in Golang as well. The following codes will demonstrate how to get same result as the curl
command via Golang.
package main
import (
"compress/gzip"
"fmt"
"io"
"net/http"
"os"
)
func main() {
client := new(http.Client)
request, err := http.NewRequest("Get", " http://www.thestar.com.my", nil)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
request.Header.Add("Accept-Encoding", "gzip")
response, err := client.Do(request)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer response.Body.Close()
// Check that the server actual sent compressed data
var reader io.ReadCloser
switch response.Header.Get("Content-Encoding") {
case "gzip":
reader, err = gzip.NewReader(response.Body)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer reader.Close()
default:
reader = response.Body
}
// to standard output
_, err = io.Copy(os.Stdout, reader)
// see https://www.socketloop.com/tutorials/golang-saving-and-reading-file-with-gob
// on how to save to file
if err != nil {
fmt.Println(err)
os.Exit(1)
}
}
References :
http://golang.org/pkg/os/exec/
http://golang.org/pkg/net/http/#Get
https://www.socketloop.com/tutorials/how-to-check-with-curl-if-my-website-or-the-asset-is-gzipped
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+4.1k Linux/MacOSX : Search and delete files by extension
+15.2k Chrome : ERR_INSECURE_RESPONSE and allow Chrome browser to load insecure content
+9k Golang : How to get ECDSA curve and parameters data?
+4.7k Golang : Constant and variable names in native language
+10.1k Golang : Generate 403 Forbidden to protect a page or prevent indexing by search engine
+14.5k Golang : Basic authentication with .htpasswd file
+6.1k Golang : Handling image beyond OpenCV video capture boundary
+18.1k Golang : Find IP address from string
+4.3k Javascript : Detect when console is activated and do something about it
+34.6k Golang : Upload and download file to/from AWS S3
+34.9k Golang : Strip slashes from string example
+15.8k Golang : Loop each day of the current month example