Author |
Message |
technosaurus

Joined: 18 May 2008 Posts: 4787 Location: Kingwood, TX
|
Posted: Sat 20 Dec 2008, 17:04 Post subject:
Need 2 Bash scripts that compare, delete files Subject description: for an ultimate puplet CD/DVD |
|
I need to write 2 scripts to create the ultimate puplet CD/DVD
The first compares 2 directories and deletes all files in <dirA> that are "different" from <dirB>(md5 sum would be fine or - maybe just name & file size)
The second compares 2 directories and deletes all files in <dirA> that are "the same" as in <dirB>
What is the best way to do this? Any suggestions?
Here is what it is for:
Merge the zdrv_XXX.sfs (if applicable) with all shared files between all of the puplets (of same major version) using dir2sfs zdrv_XXX - this is from the first script
Use the "phome=" boot parameter to access pup_XXX.sfs files from the second script (dir2sfs of each folder)
_________________ Check out my github repositories. I may eventually get around to updating my blogspot.
|
Back to top
|
|
 |
Pizzasgood

Joined: 04 May 2005 Posts: 6266 Location: Knoxville, TN, USA
|
Posted: Sat 20 Dec 2008, 20:41 Post subject:
|
|
Try these. WARNING: These will not work with spaces in the paths. That could be done, but I'm lazy. If you do fix it to handle spaces, it can also get messed up if a filename or directory has the string " and " in it (that's and with a space on each side).
These assume that the first directory passed to it is the one you want to delete things from. So if you run them like this:
script <dirA> <dirB>
They'll delete the different or identical files from <dirA>.
Code: | #!/bin/sh
FILES=$(diff -qsr "$1" "$2")
IDENTICALS="$(echo "$FILES" | grep 'are identical$' | sed 's/ are identical$//' | sed 's|^Files \(.*\) and .*|\1|')"
echo "Identical files"
echo "$IDENTICALS"
echo
for i in $IDENTICALS; do rm -rf "$i"; done |
Code: | #!/bin/sh
FILES=$(diff -qsr "$1" "$2")
DIFFERENTS="$(echo "$FILES" | grep -v 'are identical$' | sed 's|^Only in \(.*\): \(.*\)|\1/\2|' | sed 's|^Files \(.*\) and .* differ$|\1|' | grep -v "^$2")"
echo "Different files"
echo "$DIFFERENTS"
echo
for i in $DIFFERENTS; do rm -rf "$i"; done |
_________________ Between depriving a man of one hour from his life and depriving him of his life there exists only a difference of degree. --Muad'Dib

|
Back to top
|
|
 |
technosaurus

Joined: 18 May 2008 Posts: 4787 Location: Kingwood, TX
|
Posted: Sun 21 Dec 2008, 00:07 Post subject:
|
|
Thanks pizza' & thanks for the warning - looks like I have some more puplet downloading to do.
_________________ Check out my github repositories. I may eventually get around to updating my blogspot.
|
Back to top
|
|
 |
trapster

Joined: 28 Nov 2005 Posts: 2106 Location: Maine, USA
|
Posted: Sun 21 Dec 2008, 08:24 Post subject:
|
|
I like the identicals script. It will be handy for comparing new music files to my collection.
How do I change it to ignore the suffix so I can compare .mp3 to .ogg filenames?
_________________ trapster
Maine, USA
Asus eeepc 1005HA PU1X-BK
Frugal install: Slacko
Currently using full install: DebianDog
|
Back to top
|
|
 |
Pizzasgood

Joined: 04 May 2005 Posts: 6266 Location: Knoxville, TN, USA
|
Posted: Sun 21 Dec 2008, 15:16 Post subject:
|
|
I don't think a simple change will suffice. If you just want to find all identical files in any given spot(s), you could just take the md5sum of every file, then identify any duplicate sums.
Here's a script that seems to work, even with spaces. It can be passed a list of files and directories, and it will compare every file to see which ones are identical, regardless of name:
Code: | #!/bin/sh
#get a unique filename in /tmp
CHECKSUMS="/tmp/checksums_$RANDOM"
while [ -f "$CHECKSUMS" ]; do
CHECKSUMS="/tmp/checksums_$RANDOM"
done
#put the md5sums of all files in the passed directories into the file
find "$@" -type f -exec md5sum "{}" + >> "$CHECKSUMS"
#loop through each md5sum in the file
NUM_MATCHES=0
MATCHES=""
for i in $(grep -o '^[^ ]*' "$CHECKSUMS"); do
#grab all entries that have the same md5sum as the current one
ENTRIES="$(grep "$i" "$CHECKSUMS")"
#parse out the filenames
FILES="$(echo "$ENTRIES" | sed 's/^[0-9a-f]\{32\}\s*//')"
#if there are more than one file with the same md5sum as the current one, and
#the current one hasn't been used before, then list the files as being identical
if [ $( echo "$FILES" | grep -c '^') -gt 1 ] && [ "$(echo ${MATCHES[*]} | grep $i)" = "" ]; then
#and add the md5sum to the list of used ones in $MATCHES
MATCHES[$NUM_MATCHES]=$i
NUM_MATCHES=$[$NUM_MATCHES+1]
echo "These files are identical:"
echo "$FILES"
echo
fi
done
#clean up
rm -f "$CHECKSUMS" |
Code: | # ./Script .
These files are identical:
./asd fff
./sdd
These files are identical:
./q/c/d/FILE
./k/c/d/FILE
./b
These files are identical:
./c
./a
# |
_________________ Between depriving a man of one hour from his life and depriving him of his life there exists only a difference of degree. --Muad'Dib

|
Back to top
|
|
 |
vtpup

Joined: 15 Oct 2008 Posts: 1208 Location: Republic of Vermont
|
Posted: Tue 20 Jan 2009, 00:11 Post subject:
|
|
Just a note of clarification ..... the two scripts that Technosaurus asked for in the first post are in reverse order to the two scripts that Pizzasgood provided in the second post.
The script that leaves behind all the different files is Pizzasgood's first script, and the script that leaves behind all the identical files is his second script.
Thanks, of course, for both!
|
Back to top
|
|
 |
|