Need 2 Bash scripts that compare, delete files

Using applications, configuring, problems
Post Reply
Message
Author
User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

Need 2 Bash scripts that compare, delete files

#1 Post by technosaurus »

I need to write 2 scripts to create the ultimate puplet CD/DVD

The first compares 2 directories and deletes all files in <dirA> that are "different" from <dirB>(md5 sum would be fine or - maybe just name & file size)

The second compares 2 directories and deletes all files in <dirA> that are "the same" as in <dirB>

What is the best way to do this? Any suggestions?

Here is what it is for:
Merge the zdrv_XXX.sfs (if applicable) with all shared files between all of the puplets (of same major version) using dir2sfs zdrv_XXX - this is from the first script

Use the "phome=" boot parameter to access pup_XXX.sfs files from the second script (dir2sfs of each folder)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Pizzasgood
Posts: 6183
Joined: Wed 04 May 2005, 20:28
Location: Knoxville, TN, USA

#2 Post by Pizzasgood »

Try these. WARNING: These will not work with spaces in the paths. That could be done, but I'm lazy. ;) If you do fix it to handle spaces, it can also get messed up if a filename or directory has the string " and " in it (that's and with a space on each side).

These assume that the first directory passed to it is the one you want to delete things from. So if you run them like this:
script <dirA> <dirB>
They'll delete the different or identical files from <dirA>.

Code: Select all

#!/bin/sh
FILES=$(diff -qsr "$1" "$2")
IDENTICALS="$(echo "$FILES" | grep 'are identical$' | sed 's/ are identical$//' | sed 's|^Files \(.*\) and .*|\1|')"
echo "Identical files"
echo "$IDENTICALS"
echo
for i in $IDENTICALS; do rm -rf "$i"; done

Code: Select all

#!/bin/sh
FILES=$(diff -qsr "$1" "$2")
DIFFERENTS="$(echo "$FILES" | grep -v 'are identical$' | sed 's|^Only in \(.*\): \(.*\)|\1/\2|' | sed 's|^Files \(.*\) and .* differ$|\1|' | grep -v "^$2")"
echo "Different files"
echo "$DIFFERENTS"
echo
for i in $DIFFERENTS; do rm -rf "$i"; done
[size=75]Between depriving a man of one hour from his life and depriving him of his life there exists only a difference of degree. --Muad'Dib[/size]
[img]http://www.browserloadofcoolness.com/sig.png[/img]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#3 Post by technosaurus »

Thanks pizza' & thanks for the warning - looks like I have some more puplet downloading to do.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
trapster
Posts: 2117
Joined: Mon 28 Nov 2005, 23:14
Location: Maine, USA
Contact:

#4 Post by trapster »

I like the identicals script. It will be handy for comparing new music files to my collection.
How do I change it to ignore the suffix so I can compare .mp3 to .ogg filenames?
trapster
Maine, USA

Asus eeepc 1005HA PU1X-BK
Frugal install: Slacko
Currently using full install: DebianDog

User avatar
Pizzasgood
Posts: 6183
Joined: Wed 04 May 2005, 20:28
Location: Knoxville, TN, USA

#5 Post by Pizzasgood »

I don't think a simple change will suffice. If you just want to find all identical files in any given spot(s), you could just take the md5sum of every file, then identify any duplicate sums.

Here's a script that seems to work, even with spaces. It can be passed a list of files and directories, and it will compare every file to see which ones are identical, regardless of name:

Code: Select all

#!/bin/sh

#get a unique filename in /tmp
CHECKSUMS="/tmp/checksums_$RANDOM"
while [ -f "$CHECKSUMS" ]; do
  CHECKSUMS="/tmp/checksums_$RANDOM"
done

#put the md5sums of all files in the passed directories into the file
find "$@" -type f -exec md5sum "{}" + >> "$CHECKSUMS"

#loop through each md5sum in the file
NUM_MATCHES=0
MATCHES=""
for i in $(grep -o '^[^ ]*' "$CHECKSUMS"); do
  #grab all entries that have the same md5sum as the current one
  ENTRIES="$(grep "$i" "$CHECKSUMS")"
  #parse out the filenames
  FILES="$(echo "$ENTRIES" | sed 's/^[0-9a-f]\{32\}\s*//')"
  #if there are more than one file with the same md5sum as the current one, and
  #the current one hasn't been used before, then list the files as being identical
  if [ $( echo "$FILES" | grep -c '^') -gt 1 ] && [ "$(echo ${MATCHES[*]} | grep $i)" = "" ]; then
    #and add the md5sum to the list of used ones in $MATCHES
    MATCHES[$NUM_MATCHES]=$i
    NUM_MATCHES=$[$NUM_MATCHES+1]
    echo "These files are identical:"
    echo "$FILES"
    echo
  fi
done

#clean up
rm -f "$CHECKSUMS"

Code: Select all

# ./Script .
These files are identical:
./asd fff
./sdd

These files are identical:
./q/c/d/FILE
./k/c/d/FILE
./b

These files are identical:
./c
./a

#
[size=75]Between depriving a man of one hour from his life and depriving him of his life there exists only a difference of degree. --Muad'Dib[/size]
[img]http://www.browserloadofcoolness.com/sig.png[/img]

User avatar
vtpup
Posts: 1420
Joined: Thu 16 Oct 2008, 01:42
Location: Republic of Vermont
Contact:

#6 Post by vtpup »

Just a note of clarification ..... the two scripts that Technosaurus asked for in the first post are in reverse order to the two scripts that Pizzasgood provided in the second post.

The script that leaves behind all the different files is Pizzasgood's first script, and the script that leaves behind all the identical files is his second script.

Thanks, of course, for both!

Post Reply