Golang : Find duplicate files with filepath.Walk
Sometimes we downloaded a lot of files in a directory and although the files have different names, they could be duplicates of the same file . This small Golang program will scan a target directory and create a hash map for each files. If any files have similar sha512 hash, then they are ... essentially the same.
package main
import (
"crypto/sha512"
"fmt"
"io/ioutil"
"os"
"path/filepath"
)
var files = make(map[[sha512.Size]byte]string)
func checkDuplicate(path string, info os.FileInfo, err error) error {
if err != nil {
fmt.Println(err)
return nil
}
if info.IsDir() { // skip directory
return nil
}
data, err := ioutil.ReadFile(path)
if err != nil {
fmt.Println(err)
return nil
}
hash := sha512.Sum512(data) // get the file sha512 hash
if v, ok := files[hash]; ok {
fmt.Printf("%q is a duplicate of %q\n", path, v)
} else {
files[hash] = path // store in map for comparison
}
return nil
}
func main() {
if len(os.Args) != 2 {
fmt.Printf("USAGE : %s <target_directory> \n", os.Args[0])
os.Exit(0)
}
dir := os.Args[1] // get the target directory
err := filepath.Walk(dir, checkDuplicate)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
}
Sample output :
"/Users/sweetlogic/Applications/.localized" is a duplicate of "/Users/.localized"
"/Users/sweetlogic/Desktop/.localized" is a duplicate of "/Users/.localized"
"/Users/sweetlogic/Desktop/01.jpg" is a duplicate of "/Users/sweetlogic/01.jpg"
"/Users/sweetlogic/Desktop/02.jpg" is a duplicate of "/Users/sweetlogic/02.jpg"
"/Users/sweetlogic/Desktop/03.jpg" is a duplicate of "/Users/sweetlogic/03.jpg"
See also : Generate checksum for a file in Go
By Adam Ng
IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.
Advertisement
Tutorials
+10.1k Golang : How to use if, eq and print properly in html template
+7.2k How to show different content from website server when AdBlock is detected?
+12.9k Golang : Check if a file exist or not
+12.7k Golang : How to convert a number to words
+5.4k Golang : Break string into a slice of characters example
+23.2k Golang : GORM read from database example
+20.5k Golang : Join arrays or slices example
+25.9k PHP : Count number of JSON items/objects
+18.8k Golang : Read directory content with os.Open
+8k Golang : Generate random Chinese, Japanese, Korean and other runes
+9.9k Golang : Intercept and process UNIX signals example
+9.2k Golang : cannot assign type int to value (type uint8) in range error