Need 2 Bash scripts that compare, delete files

Message

technosaurus · #1 Post by **technosaurus** » Sat 20 Dec 2008, 21:04

I need to write 2 scripts to create the ultimate puplet CD/DVD

The first compares 2 directories and deletes all files in <dirA> that are "different" from <dirB>(md5 sum would be fine or - maybe just name & file size)

The second compares 2 directories and deletes all files in <dirA> that are "the same" as in <dirB>

What is the best way to do this? Any suggestions?

Here is what it is for:
Merge the zdrv_XXX.sfs (if applicable) with all shared files between all of the puplets (of same major version) using dir2sfs zdrv_XXX - this is from the first script

Use the "phome=" boot parameter to access pup_XXX.sfs files from the second script (dir2sfs of each folder)

#2 Post by **Pizzasgood** » Sun 21 Dec 2008, 00:41

Try these. WARNING: These will not work with spaces in the paths. That could be done, but I'm lazy.

If you do fix it to handle spaces, it can also get messed up if a filename or directory has the string " and " in it (that's and with a space on each side).

These assume that the first directory passed to it is the one you want to delete things from. So if you run them like this:
script <dirA> <dirB>
They'll delete the different or identical files from <dirA>.

Code: Select all

#!/bin/sh
FILES=$(diff -qsr "$1" "$2")
IDENTICALS="$(echo "$FILES" | grep 'are identical$' | sed 's/ are identical$//' | sed 's|^Files \(.*\) and .*|\1|')"
echo "Identical files"
echo "$IDENTICALS"
echo
for i in $IDENTICALS; do rm -rf "$i"; done

Code: Select all

#!/bin/sh
FILES=$(diff -qsr "$1" "$2")
DIFFERENTS="$(echo "$FILES" | grep -v 'are identical$' | sed 's|^Only in \(.*\): \(.*\)|\1/\2|' | sed 's|^Files \(.*\) and .* differ$|\1|' | grep -v "^$2")"
echo "Different files"
echo "$DIFFERENTS"
echo
for i in $DIFFERENTS; do rm -rf "$i"; done

technosaurus · #3 Post by **technosaurus** » Sun 21 Dec 2008, 04:07

Thanks pizza' & thanks for the warning - looks like I have some more puplet downloading to do.

trapster · #4 Post by **trapster** » Sun 21 Dec 2008, 12:24

I like the identicals script. It will be handy for comparing new music files to my collection.
How do I change it to ignore the suffix so I can compare .mp3 to .ogg filenames?

#5 Post by **Pizzasgood** » Sun 21 Dec 2008, 19:16

I don't think a simple change will suffice. If you just want to find all identical files in any given spot(s), you could just take the md5sum of every file, then identify any duplicate sums.

Here's a script that seems to work, even with spaces. It can be passed a list of files and directories, and it will compare every file to see which ones are identical, regardless of name:

Code: Select all

#!/bin/sh

#get a unique filename in /tmp
CHECKSUMS="/tmp/checksums_$RANDOM"
while [ -f "$CHECKSUMS" ]; do
  CHECKSUMS="/tmp/checksums_$RANDOM"
done

#put the md5sums of all files in the passed directories into the file
find "$@" -type f -exec md5sum "{}" + >> "$CHECKSUMS"

#loop through each md5sum in the file
NUM_MATCHES=0
MATCHES=""
for i in $(grep -o '^[^ ]*' "$CHECKSUMS"); do
  #grab all entries that have the same md5sum as the current one
  ENTRIES="$(grep "$i" "$CHECKSUMS")"
  #parse out the filenames
  FILES="$(echo "$ENTRIES" | sed 's/^[0-9a-f]\{32\}\s*//')"
  #if there are more than one file with the same md5sum as the current one, and
  #the current one hasn't been used before, then list the files as being identical
  if [ $( echo "$FILES" | grep -c '^') -gt 1 ] && [ "$(echo ${MATCHES[*]} | grep $i)" = "" ]; then
    #and add the md5sum to the list of used ones in $MATCHES
    MATCHES[$NUM_MATCHES]=$i
    NUM_MATCHES=$[$NUM_MATCHES+1]
    echo "These files are identical:"
    echo "$FILES"
    echo
  fi
done

#clean up
rm -f "$CHECKSUMS"

Code: Select all

# ./Script .
These files are identical:
./asd fff
./sdd

These files are identical:
./q/c/d/FILE
./k/c/d/FILE
./b

These files are identical:
./c
./a

#

vtpup · #6 Post by **vtpup** » Tue 20 Jan 2009, 04:11

Just a note of clarification ..... the two scripts that Technosaurus asked for in the first post are in reverse order to the two scripts that Pizzasgood provided in the second post.

The script that leaves behind all the different files is Pizzasgood's first script, and the script that leaves behind all the identical files is his second script.

Thanks, of course, for both!

(old)Puppy Linux Discussion Forum

(old)Puppy Linux Discussion Forum

Need 2 Bash scripts that compare, delete files

Need 2 Bash scripts that compare, delete files