Wednesday, December 25, 2013

Off Topic - A Tool For Deleting Duplicated Files

Here is a command line tool I wrote a few days ago because I couldn't find any satisfying enough application that would help me clean my HDDs. The problem was that I had many duplicated files; the same pictures stored in different folders for back-up, mp3s scattered in various locations, personal documents - stuff that gathered up to 10 years of dust, copied from one place to another.

Here is the source code:

How to compile and run:
  • Copy all 3 files in a folder. Open the terminal and $cd to that folder.
  • $mkdir ./ro/alexandrugris/diskutil
  • $mv ./ ./ro/alexandrugris/diskutil/
  • $javac ./ro/alexandrugris/diskutil/*.java
  • $jar -cfm diskutil.jar Manifest.txt ./ro/alexandrugris/diskutil/*.class
  • # double check that derby.jar is in the same directory
  • $java -jar diskutil.jar --help

  • --prepareDb  -> builds file database structure
  • --updateDb folder1 [folder2]...  -> indexes folder1, folder2, etc... 
  • --locate word1 [word2]... -> locates files for which the file name contains word1, word2,... in any order
  • --usage [min size MB] -> shows directory size in descending order
  • --duplicates [min size KB] [filetype1] [filetype2] ... -> finds the duplicates

How to use: 
  • Run the --duplicates command iteratively. The command shows an interactive menu for each duplicated file. Double check with the file manager what files it prompts to delete.
  • Reindex and repeat the cycle a few times.
  • Here is a sample of the menu prompted by the --duplicates command:

File: convhull.png [0MB] 
 --> /Users/Alex/Applications/Octave/octave-3.6.4/doc/interpreter 
 --> /Users/Alex/Applications/Octave/octave-3.6.4/doc/interpreter/octave.html 
1. Delete file from left
2. Delete all files from left directory
3. Ignore all comparisons with left directory
4. Delete file from right
5. Delete all files from right directory
6. Ignore all comparisons with right directory
7. Delete recursive folder...
8. Ignore folder...
0. Move to next file (do nothing)

Two points: 
  • Left and right are actually the first and the second folder respectively :).
  • Commands that end with "..." require a directory path. For instance, in this case, entering
8 /Users/Alex/Applications will instruct the tool to subsequently ignore all the files that come from the "Applications" folder. 

  • With the help of the tool I freed around 60 GB from my 150 GB laptop HDD and 80 from the 250 GB back-up HDD. 

  • Except for basic run tests, I ran the tool only from within the debugger. :)
  • The tool has not been fully tested. I stopped working on it when my HDD was clean and my needs satisfied.  :)