Golang : read gzipped http response
There are times when we want to grab a website content for parsing(crawling) and found out that the content is gzipped.
Normally, to deal with gzipped HTML reply, you can use Exec package to execute curl
from command line and pipe the gzipped content to gunzip, such as this :
curl -H "Accept-Encoding: gzip" http://www.thestar.com.my | gunzip
Another way to process gzipped http response can be done in Golang as well. The following codes will demonstrate how to get same result as the curl
command via Golang.
package main
import (
"compress/gzip"
"fmt"
"io"
"net/http"
"os"
)
func main() {
client := new(http.Client)
request, err := http.NewRequest("Get", " http://www.thestar.com.my", nil)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
request.Header.Add("Accept-Encoding", "gzip")
response, err := client.Do(request)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer response.Body.Close()
// Check that the server actual sent compressed data
var reader io.ReadCloser
switch response.Header.Get("Content-Encoding") {
case "gzip":
reader, err = gzip.NewReader(response.Body)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
defer reader.Close()
default:
reader = response.Body
}
// to standard output
_, err = io.Copy(os.Stdout, reader)
// see https://www.socketloop.com/tutorials/golang-saving-and-reading-file-with-gob
// on how to save to file
if err != nil {
fmt.Println(err)
os.Exit(1)
}
}
References :
http://golang.org/pkg/os/exec/
http://golang.org/pkg/net/http/#Get
https://www.socketloop.com/tutorials/how-to-check-with-curl-if-my-website-or-the-asset-is-gzipped
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+6.9k Fix sudo yum hang problem with no output or error messages
+26.8k Golang : Find files by extension
+13.9k Golang : How to determine if a year is leap year?
+17.6k Golang : [json: cannot unmarshal object into Go value of type]
+6.1k Golang : Scan forex opportunities by Bollinger bands
+9.3k Golang : How to get garbage collection data?
+24.5k Golang : How to validate URL the right way
+9.1k Golang : Get curl -I or head data from URL example
+14k Golang : Reverse IP address for reverse DNS lookup example
+12.2k Golang : List running EC2 instances and descriptions
+5.5k Clean up Visual Studio For Mac installation failed disk full problem
+6k Golang : How to verify input is rune?