Speeding up the SnapMerge

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#16 Post by jamesbond »

jemimah wrote:Ok here is the situation that definitely gets you I/O errors.

File exists in Read-Only layer
Both file and its whiteout exist in Read-Write layer.

This does on occaision happen unintentionally with Puppy and the only fix is to delete the offending whiteouts by hand.

So at the very least, it is necessary to check the save file.
Hmm that's odd, this is not supposed to happen - I mean, I can't see the scenario that leads to it. If there is already a whiteout file on the pup_rw, creating a new file with the same name should automatically clear remove the whiteout file. But we know "impossible thing" sometimes does happen :P

So perhaps we can get away by doing the test once every 20th boot or so? (and/or with some option to do manual check if required - just like fsck).
jpeps wrote:I tried with bash/sh/ash in Lucid. All use gnu coreutils unless I specifically specify "busybox"
Busybox must be specifically compiled for this to happen ("exec prefers applet" and "standalone shell" must be enabled).
jpeps wrote:I've also had to manually remove whiteout files that get into pupsave and prevent subsequent loading of files. For example, picpuz was separated out in an old remaster, although there in the present lupu-sfs. However, there's

/initrd/pup_rw/usr/share/pixmaps/.wh.picpuz.png
/initrd/pup_rw/usr/local/.wh.picpuz

..so files are missing, and it won't run. Delete the whiteouts, reboot, and all is well.
This is expected - that's what pfix=upgrade is supposed to do.

Technosaurus, I always thought $((s+s)) expression is bash-ism ... but to my surprise, it does work in ash. Bash has a pretty good reference docs in gnu.org, does a similar doc exist for ash? (at least for busybox ash?)
EDIT: Found it here http://linux.die.net/man/1/ash.
Last edited by jamesbond on Mon 07 Feb 2011, 12:28, edited 1 time in total.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#17 Post by technosaurus »

That busybox must be compiled without the prefer applets _option_
A pity too - it can significantly speed up scripts and broken ones should have a bash shabang anyways.
During testing I build a full busybox with both ash and hush with ash aliased to bash and hush aliased to sh
... hush is much more compatible nowadays so I may switch them eventually
... I try to make my scripts compatible with ash, hush and bash, so this makes it easier to test them.
It only gets /bin/sh if it works in all 3 (sh may be a symlink to any of them)
It's not really too difficult to make scripts compatible ...dash shell being the exception
Making them work with busybox applets isn't bad if there are good comments
- the busybox mail list is helpful with sorting out bugs and missing features.
One the busybox developer complaints is that prefer applets and nofork/noexec doesn't get tested/reported ... users just recompile with them disabled and don't report any issues. (It may be default in the short future for .0 unstable release)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#18 Post by jpeps »

jamesbond wrote:
jpeps wrote: ..so files are missing, and it won't run. Delete the whiteouts, reboot, and all is well.
This is expected - that's what pfix=upgrade is supposed to do.
hmm...whiteouts somehow got created after moving regular files to an extra_sfs during a remaster...nothing under /usr. pfix=upgrade does nothing. Even pfix=clean doesn't get rid of them, so original files don't load. They have to be found and manually deleted first if you want to continue with your old pupsave. I don't know why this is expected.

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#19 Post by jamesbond »

jpeps wrote:
jamesbond wrote:
jpeps wrote: ..so files are missing, and it won't run. Delete the whiteouts, reboot, and all is well.
This is expected - that's what pfix=upgrade is supposed to do.
hmm...whiteouts somehow got created after moving regular files to an extra_sfs during a remaster...nothing under /usr. pfix=upgrade does nothing. Even pfix=clean doesn't get rid of them, so original files don't load. They have to be found and manually deleted first if you want to continue with your old pupsave. I don't know why this is expected.
Nah, sorry, I read your original post wrongly. Yeah remaster shouldn't copy all these whiteout files.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#20 Post by jamesbond »

Hmmm ... I'm close to something.

I have two scripts. Testing with tmpfs to tmpfs copy-down (just to check script performance - this is not real-world performance):
- s5 with utf8 locale = 2m34s, with non-utf8 locale it's 34s only.
- s7 = 6s (locale doesn't matter much)

Test pupsave is 1GB live data, 820MB used, 17,000+ files.
s7 is much faster but doesn't cope very well when savefile is almost full (because it copy before delete)
s5 is better because it delete before copy, but slower.
Neither script checks lower layers (SFS layers) - as I posted earlier, I don't see why it's necessary.

I have an idea to combine the best of s5 and s7 - but that will have to wait until tomorrow.

Note: script uses ash, thanks to technosaurus.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#21 Post by technosaurus »

jamesbond wrote:s7 is much faster but doesn't cope very well when savefile is almost full (because it copy before delete)
mv -f ???

Note: I never use a save file - just trying to help where I can without my puppypc available - sorry for the suggestions without real code

When I look at it, I see the possibility to combine all 3 loops into 1 and recurse the tree only once using a function that calls itself for directories.

oversimplified version

Code: Select all

case $1 in
   blacklisted|dir|list)exit;;
   *)dirstuff_here_now
    for x in * do
    [ -d $x ] && dirstuff_moved_out_of_loop && this_function $1/$x & || do_other_stuff
    done
esac
...
something like this can do several directories at once in separate threads and does not need find because it uses builtin functions to check each file

but really that order is important, since the first check should really cover most cases so that further checks are unnecessary
the only easy way I can think of to check for .wh files faster is to think of them as an array of characters such that ${x:0:4} is equal to .wh. (cool eh?) so since this is a simple string comparison that does not have to access the file (slow) it should be first, then dir check (to start another thread quicker if necessary), then links (because it only needs 1 check)

dirstuff first
for x in ....
if .wh* do .wh stuff
else if dir recursively call this function
else if link do link stuff
else (must be a file) ... do file stuff

recursion always used to give me a headache, so let me know if you anything is not clear and I will try to clarify

EDIT: thinking about .wh. file issues the substrings could be used to check for real files ${x:4} would be the name of the real file such that
[ -e ${DIR}/${x:4} ] && echo file ${x:4} exists in ${DIR}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#22 Post by jamesbond »

technosaurus wrote:
jamesbond wrote:s7 is much faster but doesn't cope very well when savefile is almost full (because it copy before delete)
mv -f ???
No, because it's checking for different files in different layer.
the only easy way I can think of to check for .wh files faster is to think of them as an array of characters such that ${x:0:4} is equal to .wh. (cool eh?)
I did that - an again, pleasantly surprised it works both in bash and ash. I'm not sure I understand the rest of your point, though.

Got my s8 script I hinted at previous post. Speed is same as s7. I was about to post it here but I just note a big hole in my scripts - I forgot to treat opaque dir properly. I'm just so close :evil: But I'll be back ...
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#23 Post by jpeps »

It's interesting watching the whiteout list grow. I erased them all after doing a pfix-clean this morning. I'll have to think about why I need them.

Current:

/initrd/pup_rw/.wh..wh.orph
/initrd/pup_rw/.wh..wh.aufs
/initrd/pup_rw/var/local/icons/.wh..wh..opq
/initrd/pup_rw/dev/usb/.wh.lp0
/initrd/pup_rw/.wh..wh.plnk
/initrd/pup_rw/etc/.wh.windowmanager.openbox
/initrd/pup_rw/usr/lib/openoffice.org3/share/uno_packages/cache/uno_packages/.wh.SH1dMc_
/initrd/pup_rw/usr/lib/openoffice.org3/share/uno_packages/cache/uno_packages/.wh.SH1dMc

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#24 Post by jamesbond »

Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.

Code: Select all

#!/bin/ash
# jamesbond 2011 - GPLv3
# s9 - improved s8 with bugfix for dir opaque
# check whiteout from tmpfs and puprw, make room before doing rsync
# Note: works for AUFS only
#  0m6secs (lang utf8)

# change these two variables. Do not use trailing slash.
TMPFS=/mnt/layer1     # real location is /initrd/pup_rw
PUPSAVE=/mnt/layer2   # real location is /initrd/pup_ro1

################# main ###################
# check for new whiteouts - remove them from pupsave
echo "deleting newly deleted files"
find "$TMPFS" | sed '
# dont process .wh..wh.orph
/\.wh\.\.wh\.orph/ d
# dont process .wh..wh.plnk
/\.wh\.\.wh\.plnk/ d
# dont process .wh..wh.aufs
/\.wh\.\.wh\.aufs/ d
# dont process .wh..wh..opq
/\.wh\.\.wh\.\.opq/ d
# process whiteout files
/\.wh\./ p
# and delete anything else
d' | while read -r FILE; do
	#echo $FILE					# $FILE is TMPFS_WHITEOUT
	FULLNAME="${FILE#$TMPFS}"
	#echo $FULLNAME
	BASE="${FULLNAME%/*}"
	#echo $BASE
	LEAF="${FULLNAME##*/}"
	#echo $LEAF
	#echo $BASE/$LEAF
	
	PUPSAVE_FILE="${PUPSAVE}${BASE}/${LEAF:4}"	
	echo "Deleting $PUPSAVE_FILE"
	rm -rf "$PUPSAVE_FILE"		# delete the file/dir if it's there

done

# check for old whiteouts - remove them from pupsave
echo "deleting old whiteouts"
find "$PUPSAVE" | sed '
# dont process .wh..wh.orph
/\.wh\.\.wh\.orph/ d
# dont process .wh..wh.plnk
/\.wh\.\.wh\.plnk/ d
# dont process .wh..wh.aufs
/\.wh\.\.wh\.aufs/ d
# dont process .wh..wh..opq
/\.wh\.\.wh\.\.opq/ d
# process whiteout files
/\.wh\./ p
# and delete anything else
d' | while read -r FILE; do
	#echo $FILE					# $FILE is PUPSAVE_WHITEOUT
	FULLNAME="${FILE#$PUPSAVE}"
	#echo $FULLNAME
	BASE="${FULLNAME%/*}"
	#echo $BASE
	LEAF="${FULLNAME##*/}"
	#echo $LEAF
	#echo $BASE/$LEAF
	
	TMPFS_FILE="${TMPFS}${BASE}/${LEAF:4}"
	#echo $TMPFS_FILE

	# delete whiteout only if a new file/dir has been created in the tmpfs layer
	if [ -e "$TMPFS_FILE" -o -L "$TMPFS_FILE" ]; then
		# if TMPFS_FILE is a dir, we need to add diropq when remove its pupsave whiteout
		[ -d "$TMPFS_FILE" ] &&	touch "$TMPFS_FILE/.wh..wh..opq"
		echo Deleting whiteout $FILE
		rm -f "$FILE"
	fi
done

# by now we should be consistent - so rsync everything
# and cleanup tmpfs if rsync is successful
echo rsync-ing
if rsync -a "$TMPFS"/ "$PUPSAVE"; then
	find "$TMPFS" -maxdepth 1 | sed '
	# dont process the first line - thats our tmpfs mountpoint
	1 d
	# dont process .wh..wh.orph
	/\.wh\.\.wh\.orph/ d
	# dont process .wh..wh.plnk
	/\.wh\.\.wh\.plnk/ d
	# dont process .wh..wh.aufs
	/\.wh\.\.wh\.aufs/ d' | while read -r FILE; do
		rm -rf "$FILE"
	done
else
	Xdialog --infobox "Your save file is full, please copy important items manually elsewhere." 0 0 10000
fi
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
Q5sys
Posts: 1105
Joined: Thu 11 Dec 2008, 19:49
Contact:

#25 Post by Q5sys »

jamesbond wrote:Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.
How much faster do you estimate this is over the default way of doing things? I'm quite impressed by everyones work in this thread.
Image for everyone. :P

User avatar
jemimah
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL
Contact:

#26 Post by jemimah »

I'm not sure if it's safe to delete items out of the RAM layer.

Barry has this

Code: Select all

#flock -x -n "$N" -c rm -f "$N" #remove if file not in use
But it's commented out so I guess it didn't work. Also it brings up the point that if the file is open, moving it, then deleting it may cause corruption.

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#27 Post by jamesbond »

If that's the case we can drop the "delete tmpfs" code and leave the rsync alone. But that won't free up the tmpfs even after copy-down - is this the expected behaviour?
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
jemimah
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL
Contact:

#28 Post by jemimah »

jamesbond wrote:If that's the case we can drop the "delete tmpfs" code and leave the rsync alone. But that won't free up the tmpfs even after copy-down - is this the expected behaviour?
That's how the current script works. It would be neat if we did figure out how to remove those unused files. Maybe someone wants to ask Barry if he remembers what the issue was?

Maybe we could just run lsof and get the list that way.

User avatar
jemimah
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL
Contact:

#29 Post by jemimah »

I suppose there will always be some delay between checking if the file is open and actually deleting it, during which the file may become open. I guess that's what the "flock" is for.

Wikipedia says this about flock.
Both flock and fcntl have quirks that occasionally puzzle programmers more familiar with other operating systems.
Mandatory locks have no effect on the unlink function. As a result, certain programs may, effectively, circumvent mandatory locking. The authors of Advanced Programming in the UNIX Environment (Second Edition) observed that the ed editor did so (page 456).
Seems like a complicated problem.

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#30 Post by jamesbond »

In the usual situation deleting (=unlinking) the file / dir while it's being open is not an issue at all. The name will be deleted, new process can't see them, but old processes that have opened them can still use them using the old file handle. It's only when all processes holding the open handle have terminated, the space will be freed, and the file/dir will be finally removed. So in this case I don't see the need to use flock. I do read an article that using flock while unlinking defeats the very purpose of flock http://world.std.com/~swmcd/steven/tech/flock.html

That's only true for userspace apps, though, and since aufs is kernel mode apps ... I'm not sure how true it is. But I've been playing with deletion for a while (a very short while) - it doesn't seem to have any adverse effects on me. But perhaps because I'm not running it off my rootfs.

Using lsof won't help much - seems that in an ordinary day (=browsing etc) there are a lot of dirs being opened which means they can't be deleted. May as well don't do deletion, not worth the effort.
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#31 Post by jamesbond »

Q5sys wrote:
jamesbond wrote:Tadaa ... s9 (version 9 of the script). Same performance as s7. Comment out the echo to reduce verbosity. This code is only for the copy-down only - I'm not sure what else snapmergepuppy does. Perhaps I'll try it out later - but meanwhile, anyone is welcome to try it.
How much faster do you estimate this is over the default way of doing things? I'm quite impressed by everyones work in this thread.
Image for everyone. :P
Until this is really merged into a puplet for testing, no one can tell for sure, unfortunately. Benchmarks doesn't always translate into real-world performance :oops:
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#32 Post by technosaurus »

Umm... are we trying to manually do what aubrsync does?
http://aufs.sourceforge.net/aufs2/brsync/README.txt
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#33 Post by jamesbond »

technosaurus wrote:Umm... are we trying to manually do what aubrsync does?
http://aufs.sourceforge.net/aufs2/brsync/README.txt
Hahaha yes !!! Good find technosaurus :)

EDIT: Incidentally that script also use rsync ... so we're in the right track (when trying to re-invent the wheel, that is) :D
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#34 Post by technosaurus »

I really have no clue how the save to cd/dvd parts work, but...
http://freshmeat.net/projects/rdiff-backup/
Seems like it could be sensible?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
jemimah
Posts: 4307
Joined: Wed 26 Aug 2009, 19:56
Location: Tampa, FL
Contact:

#35 Post by jemimah »

Here is the actual script.
Attachments
aubrsync.gz
(3.2 KiB) Downloaded 489 times

Post Reply