Golang : Find duplicate files with filepath.Walk




Sometimes we downloaded a lot of files in a directory and although the files have different names, they could be duplicates of the same file . This small Golang program will scan a target directory and create a hash map for each files. If any files have similar sha512 hash, then they are ... essentially the same.

 package main

 import (
 "crypto/sha512"
 "fmt"
 "io/ioutil"
 "os"
 "path/filepath"
 )

 var files = make(map[[sha512.Size]byte]string)

 func checkDuplicate(path string, info os.FileInfo, err error) error {
 if err != nil {
 fmt.Println(err)
 return nil
 }
 if info.IsDir() { // skip directory
 return nil
 }

 data, err := ioutil.ReadFile(path)

 if err != nil {
 fmt.Println(err)
 return nil
 }

 hash := sha512.Sum512(data) // get the file sha512 hash

 if v, ok := files[hash]; ok {
 fmt.Printf("%q is a duplicate of %q\n", path, v)
 } else {
 files[hash] = path // store in map for comparison
 }

 return nil
 }

 func main() {

 if len(os.Args) != 2 {
 fmt.Printf("USAGE : %s <target_directory> \n", os.Args[0])
 os.Exit(0)
 }

 dir := os.Args[1] // get the target directory

 err := filepath.Walk(dir, checkDuplicate)

 if err != nil {
 fmt.Println(err)
 os.Exit(1)
 }
 }

Sample output :

"/Users/sweetlogic/Applications/.localized" is a duplicate of "/Users/.localized"

"/Users/sweetlogic/Desktop/.localized" is a duplicate of "/Users/.localized"

"/Users/sweetlogic/Desktop/01.jpg" is a duplicate of "/Users/sweetlogic/01.jpg"

"/Users/sweetlogic/Desktop/02.jpg" is a duplicate of "/Users/sweetlogic/02.jpg"

"/Users/sweetlogic/Desktop/03.jpg" is a duplicate of "/Users/sweetlogic/03.jpg"

  See also : Generate checksum for a file in Go





By Adam Ng

IF you gain some knowledge or the information here solved your programming problem. Please consider donating to the less fortunate or some charities that you like. Apart from donation, planting trees, volunteering or reducing your carbon footprint will be great too.


Advertisement