Golang : How to extract video or image files from html source code
There are times when I need to download a certain video that I watched using browser and I need to find the video's source file to download. The following is a simple example that demonstrate how to extract video or image files from HTML source code. Basically what it does is to grab the entire HTML data and filter out the file names with regular expression.
Here you go!
package main
import (
"io/ioutil"
"log"
"net/http"
"regexp"
)
func SearchForVideoLinks(url string) {
log.Println("Parsing : ", url)
// Request the HTML page.
resp, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("Unable to get URL with status code error: %d %s", resp.StatusCode, resp.Status)
}
htmlData, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
videoRegExp := regexp.MustCompile(`<video[^>]+`)
sourceRegExp := regexp.MustCompile(`<source[^>]+`)
videoMatchSlice := videoRegExp.FindAllStringSubmatch(string(htmlData), -1)
sourceMatchSlice := sourceRegExp.FindAllStringSubmatch(string(htmlData), -1)
for _, item := range videoMatchSlice {
log.Println("Video found : ", item)
for _, sourceItem := range sourceMatchSlice {
log.Println("Source found : ", sourceItem)
}
}
}
func SearchForImageLinks(url string) {
log.Println("Parsing : ", url)
// Request the HTML page.
resp, err := http.Get(url)
if err != nil {
log.Fatal(err)
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
log.Fatalf("Unable to get URL with status code error: %d %s", resp.StatusCode, resp.Status)
}
htmlData, err := ioutil.ReadAll(resp.Body)
if err != nil {
log.Fatal(err)
}
imageRegExp := regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
subMatchSlice := imageRegExp.FindAllStringSubmatch(string(htmlData), -1)
for _, item := range subMatchSlice {
log.Println("Image found : ", item[1])
}
}
func main() {
SearchForImageLinks("https://socketloop.com")
SearchForImageLinks("https://golang.org")
SearchForVideoLinks("https://cdpn.io/caraya/fullpage/FckCd")
}
Sample output :
2020/02/27 09:49:08 Parsing : https://socketloop.com
2020/02/27 09:49:09 Image found : https://d1ohg4ss876yi2.cloudfront.net/socketloop-logo1.png
2020/02/27 09:49:09 Image found : //pixel.quantserve.com/pixel/p-31iz6hfFutd16.gif?labels=Domain.socketloop_com,DomainId.22847
2020/02/27 09:49:09 Image found : https://sb.scorecardresearch.com/p?c1=2&c2=20015427&cv=2.0&cj=1
2020/02/27 09:49:09 Parsing : https://golang.org
2020/02/27 09:49:09 Image found : /lib/godoc/images/go-logo-blue.svg
2020/02/27 09:49:09 Image found : /lib/godoc/images/cloud-download.svg
2020/02/27 09:49:09 Image found : /lib/godoc/images/footer-gopher.jpg
2020/02/27 09:49:09 Parsing : https://cdpn.io/caraya/fullpage/FckCd
2020/02/27 09:49:10 Video found : [<video id='video' controls="controls" preload='none'
width="600" poster="http://media.w3.org/2010/05/sintel/poster.png"]
2020/02/27 09:49:10 Source found : [<source id='mp4' src="http://media.w3.org/2010/05/sintel/trailer.mp4" type='video/mp4'/]
2020/02/27 09:49:10 Source found : [<source id='webm' src="http://media.w3.org/2010/05/sintel/trailer.webm" type='video/webm'/]
2020/02/27 09:49:10 Source found : [<source id='ogv' src="http://media.w3.org/2010/05/sintel/trailer.ogv" type='video/ogg'/]
See also : Golang : How to extract links from web page ?
By Adam Ng(黃武俊)
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+4.7k Facebook : How to place save to Facebook button on your website
+18.6k Golang : convert int to string
+7.5k SSL : How to check if current certificate is sha1 or sha2 from command line
+13.5k Android Studio : Password input and reveal password example
+30.3k Golang : How to verify uploaded file is image or allowed file types
+6.6k Golang : Check if password length meet the requirement
+24k Golang : Upload to S3 with official aws-sdk-go package
+6.9k Golang : How to call function inside template with template.FuncMap
+21.8k Golang : Use TLS version 1.2 and enforce server security configuration over client
+6.9k Golang : How to setup a disk space used monitoring service with Telegram bot
+5.3k Golang : How to deal with configuration data?
+8.1k How to show different content from website server when AdBlock is detected?