Page 1 of 1

Is it unionfs bug or Puppy bug?

Posted: Sat 03 Mar 2007, 04:35
by andrei
I think that I found a bug
and suggest something like this to correct it. (This came up in our
previous discussion with PaulBx1 and GuestToo about 1187 files.)

In the file /usr/sbin/snapmergepuppy somewhere after the line
echo "Merging $SNAP onto $BASE.", would it be possible
to add something like this:

find . | grep \.wh\.__dir_opaque | sed -e's/\/\.wh\.__dir_opaque//' |
while read N; do rm -r "$BASE/$N" ; done

I am not sure this is the right thing to do, but I think when merging
the puppy filesystems we need to force the complete removal of
those directories in pup_ro1 which were whited out in pup_rw.

Because we have the following unwanted thing happening, and I think that
this is a bug, and a nasty one.

DESCRIPTION OF THE BUG

Execute the command:
mkdir /root/bug; echo "first file" >/root/bug/a1.txt
Wait for 30 minutes until Puppy merges his filesystems, and then execute this command:
rm -r /root/bug; mkdir /root/bug; echo "second file" >/root/bug/a2.txt
Now /root/bug contains a single file a2.txt, and a1.txt is gone. This is how it should be, because when we said rm -r /root/bug we removed the directory /root/bug with all its contents.

Wait for another 30 minutes, and you see that even after the merging /root/bug still has only one file, a2.txt.

But if you now reboot Puppy, then after the reboot /root/bug
will have two files a1.txt and a2.txt. This should not be happening!
I suspect that this is not a unionfs bug but actually a Puppy bug.
Perhaps a modification suggested above would correct this, but I have
not thought hard anough about this. Could an expert help?

Posted: Sun 04 Mar 2007, 02:59
by sunburnt
Kind of looks like what mksquashfs does with duplicate files.
I think mksquashfs has a command arguement to deal with this.

This utility isn't building a Squash file properly it seems... not good!

Posted: Sun 04 Mar 2007, 03:35
by andrei
Why squash? I thought pup_ro is saved to pup_save.2fs, which is ext2. Why did you mention the squash filesystem?

I am tempted to try myself the patch which I proposed, but I am kind of scared of modifying the Puppy system file...

Posted: Sun 04 Mar 2007, 08:22
by sunburnt
Because mksquashfs handles duplicate files like that (I think) unless told not to.

But it probably doesn't relate to what your dealing with.

Posted: Mon 05 Mar 2007, 06:36
by andrei
Also, in the very end of snapmergepuppy there is a block of code:

# Handle Whiteouts
# adding -mount here also...
find . -mount \( -regex '.*/\.wh\.[^/]*' -type f \) | sed -e 's/\.\///;s/\.wh\.//' |
while read N
do
rm -rf "$BASE/$N"
done

This seems wrong, because what if our .wh file screens a file in pup_214.sfs? The code, as it is now, just removes from pup_ro the FILENAME for which there is .wh.FILENAME in pup_rw (if I got it right). But if the FILENAME actually came from the squash? (See example at the end)
Perhaps we should check for that possibility and do instead something like this:

find . -mount \( -regex '.*/\.wh\.[^/]*' -type f \) | sed -e 's/\.\///' |
while read N
do
M=`echo $N | sed -e's/\.wh\.//'`
if [ -e "$BASE/$M" ]; then
rm -rf "$BASE/$M"
else
cp "$N" "$BASE/$N"
fi
done

This way, if .wh was screening a file in pup_214.sfs, it will then go
into pup_ro and continue its useful service in the next session (rather than
being lost at the time of shutdown)

As of now we have the following unwanted effect.
rm /sbin/reiserfsck
which reiserfsck

(nothing) But after the reboot:
which reiserfsck
/sbin/reiserfsck

It reappears. I guess this is not what we want.

Posted: Thu 08 Mar 2007, 09:05
by andrei
OK, test pilots wanted!

Looks like I was able to solve my problems with the Puppy filesystem. I modified the file /usr/sbin/snapmergepuppy.
This business turned out to be more complicated than I thought, there are many subtle things. At least I now understand why this problem is so difficult.
I am attaching the modified file /usr/sbin/snapmergepuppy for testing. (As snapmergepuppy.tar)
I tested it on my computer. It solved the problems described in this thread, and also the problems with cleaning the SeaMonkey cache.
This script is experimental and potentially VERY DANGEROUS, because it operates on the filesystem. It is for testing purposes only. If you want to test this new script, please read instructions and comments, which are inside the script.

Does it make sense to continue working on this?
What are the future plans concerning the Puppy filesystem?
Should we move to aufs, or continue improving the use of unionfs?

Posted: Thu 08 Mar 2007, 09:41
by Sage
I already asked BK about aufs on his developer blog and he replied.

Posted: Thu 08 Mar 2007, 10:09
by sunburnt
Sage; And what was his reply... or a link to the blog item?
I asked him about AUFS also, but I haven't heard from him.

andrei; AUFS works pretty much the same as UnionFS.
It's suppost to be better according to the developers.

But unioned on /, will it swap union branches without crashing?
It may have the same problem with it that UnionFS does.
Even if it can't swap branches, it should still replace UnionFS.

Posted: Thu 08 Mar 2007, 12:35
by Sage
puppyos.net/news, but you'll need to scan back a month or more.

Posted: Tue 20 Mar 2007, 03:00
by PaulBx1
He just answered about it again. Aufs won't be in next release (the one after CE) but perhaps later. Probably if there is enough lobbying, and it doesn't hurt to develop expertise in it so Barry is not left figuring out everything himself...

Posted: Mon 26 Mar 2007, 14:08
by Dougal
Great job, andrei.

I only have one question, starting back in Barry's original code:

Why do we use "find" with the fancy regular expression, getting only files of the type DIR/.wh.FILE?
What if there are whiteout files in the root directory of the partition/pup_save?

from sfs

Posted: Mon 26 Mar 2007, 22:52
by raffy
Andrei:
The code, as it is now, just removes from pup_ro the FILENAME for which there is .wh.FILENAME in pup_rw (if I got it right). But if the FILENAME actually came from the squash?
Yes, it is actually a problem, and while using Puppy unmodified, a utility to automatically remove those whiteout codes is needed.

So this will be the other side of your scripts, to make sure they don't exist in the sfs. I understand you've done this above, but pointing out your final advise will help people who build squash files for Puppy. Thanks!

Posted: Wed 28 Mar 2007, 21:02
by andrei
Dear Raffy,
as you suggested, let me write here the summary of what the new SMP(=snapmergepuppy) does.
First of all, remember that there are two types of the whiteout files.

The first type, which we will call "regular" whiteout files, have the names of the type ".wh.FILENAME".
The purpose of these "regular" whiteout files is to screen some files on the lower layer of the union
from being visible to the user. For example, there is a file /sbin/reiserfsck in pup_214.sfs
(which is mounted as /initrd/pup_ro2). Suppose I decided to erase is, because I think that I never
use it. I execute the command: rm /sbin/reiserfsck and after that the file is gone!
But the file could not be actually erased, because pup_214.sfs is mounted read-only, so we
strictly speaking cannot remove it. What happened is unionfs created a file .wh.reiserfsck
in /initrd/pup_rw/sbin/ which is a regular whiteout file. It now screens /initrd/pup_ro2/sbin/reiserfsck
and that's why the file appears removed.

The second type of files have the name .wh.__dir_opaque and their purpose is to screen the contents
of the directory which exists (or may exist) on some lower level of the union. For example, I never use
the folder /root/spot so suppose I decided to remove it. I execute: rm -r /root/spot and now /root/spot
is "removed" (what actually happens is that the file .wh.spot appears in /initrd/pup_rw/root/, and because
of that file we do not see /root/spot anymore). Now, after I removed /root/spot, suppose that I want
to make a new directory /root/spot, for some other purpose. I say: mkdir /root/spot and I see
that the empty directory /root/spot was created! What actually happened is this: the "regular" whiteout
file /initrd/pup_rw/root/.wh.spot was removed, and instead we have a directory created
/initrd/pup_rw/root/spot with the dir-opaque file in it: /initrd/pup_rw/root/spot/.wh.__dir_opaque
So, the original /root/spot contained a file readme.txt, but we do not see it anymore.
Because /initrd/pup_rw/root/spot/.wh.__dir_opaque screened from our view whatever is in the
directory /initrd/pup_ro2/root/spot/

To summarize, there are two types of the whiteout files: "regular" and "dir-opaque".
Remember that pup_rw is writable (it is in the RAM) and pup_ro1 is essentially
pup_save.2fs (on which snapmergepuppy(=SMP) writes changes every 30min) and pup_ro2, pup_ro3
are typically the read-only squashes.
The new SMP does the following:

# 0. First pass through the "regular" .wh. files.
# Look for all .wh. files in /initrd/pup_rw and for each found file:
# 0.1. See if it screens anything in /pup_ro1 and if it does then
# completely remove the file or directory which it screens
# 0.2. See if it screens a file or directory in some squash
# 0.3. If the conclusion is that this .wh. file screens a file in /pup_ro1
# but does not screen anything else, then remove it
# 1. Deal with .wh.__dir_opaque files. We look for all dir-opaque files
# in /initrd/pup_rw and for each found .wh.__dir_opaque:
# 1.1. Look if it screens a directory in /pup_ro1 (which is pup_save.2fs)
# If it does, completely remove the directory in /pup/ro1
# which it screens (after that the dir-opaque may become
# "redundant", see 1.3 below).
# 1.2. Look if it screens a directory in some squash.
# If it does, call it a "useful dir-opaque" and copy
# this useful dir-opaque to /pup_ro1 so that it is preserved for
# the next session.
# 1.3. If a dir-opaque does not screen a directory in a squash,
# declare it redundant and do nothing with it; it will be removed
# at the next step:
# 1.4. Remove all .wh.__dir_opaque files from pup_rw
# 2. Deal with the "regular" .wh. files. Look for all .wh. files
# in /initrd/pup_rw which are left after part 0 and for each found file:
# 2.1. Determine if it screens a file or directory in some squash
# and if it does, then copy this .wh. file to /pup_ro1
# for storage, so it persists after the reboot
# 2.2. If cannot determine what .wh. file screens, then call it
# a "strange" .wh. file and do nothing to it
# 2.3. Remove all the .wh. files which are not strange from pup_rw
# 3. Copy all files except for .wh. and dir-opaque from /pup_rw to /pup_ro
# as it was done by the previous version of snapmergepuppy.
# (This last part is unchanged)
#

As for Dougal's question, I think that perhaps I did not quite understand the question. There is an important issue, namely what to do with those whiteout files which end up in pup_save.2fs (=pup_ro1). Is this what you are asking about? I suspect that you were asking something else? Anyway, let me discuss this issue.
Suppose you scan pup_ro1 for the whitout files which do not screen anything, and declare them redundant, would it be correct to remove them? Probably not, for the following reason.
We want Puppy to be able to boot with varying sets of squash files. So, for example, on Saturday and Sunday you want to have devx_214.sfs mounted on the union, but Mondary through Friday you boot without devx_214.sfs.
This means, if you "deleted" some file in devx_214 say on Sunday, then this means that you created in pup_save a whiteout file, which will be redundant on Monday - Friday but useful during Saturday and Sunday.
This means that a script deleting the redundant whiteouts in pup_save would delete that whiteout on Monday, and you will have a problem when you need it next Saturday.
For this reason, I would vote against removing the redundant whiteouts from pup_ro1. Because they might be redundant only seemingly.

This is related to the more general question. How can we consistently manage the whiteout files in the situation when we change the content of the squash files from one boot to another?
I would propose a kind of a compromise solution. Let us do the following:
--1. Never remove a whiteout file .wh.FILE from pup_ro1 just because it
seems redundant. But:
--2. When a new squash is detected on booting, the init script should go
through all the files and directories in the new squash and remove all the
whiteout files in pup_ro1 which would screen any file or directory
in the new squash. Something like this:
find new_squash/ | while read N ; do rm /initrd/pup_ro1/(possible whiteout corresponding to $N) ; done
Notice that this should be done by the init script, not SMP.
Even this would not be totally consistent, but rather a compromise solution. Because suppose that the whiteout was screening some file in some old squash, and then we added the new squash. After the init script removed that whiteout file (because it screens the new squash) you will have the contents of the new squash now mixed with the contents of the old squash, which was previously "deleted" but now will reappear because you removed the whiteout.

Is it possible to deal with the whiteout files consistently, if the content of the squashes changes from boot to boot?
Does this problem have a reasonable solution?
I think most likely we will have to look for a compromise of some sort.
After all, "deleting" a file in the squash is not something we do very often.

Posted: Thu 29 Mar 2007, 05:35
by BarryK
andrei wrote:As for Dougal's question, I think that perhaps I did not quite understand the question. There is an important issue, namely what to do with those whiteout files which end up in pup_save.2fs (=pup_ro1). Is this what you are asking about? I suspect that you were asking something else? Anyway, let me discuss this issue.
Suppose you scan pup_ro1 for the whitout files which do not screen anything, and declare them redundant, would it be correct to remove them? Probably not, for the following reason.
We want Puppy to be able to boot with varying sets of squash files. So, for example, on Saturday and Sunday you want to have devx_214.sfs mounted on the union, but Mondary through Friday you boot without devx_214.sfs.
This means, if you "deleted" some file in devx_214 say on Sunday, then this means that you created in pup_save a whiteout file, which will be redundant on Monday - Friday but useful during Saturday and Sunday.
This means that a script deleting the redundant whiteouts in pup_save would delete that whiteout on Monday, and you will have a problem when you need it next Saturday.
For this reason, I would vote against removing the redundant whiteouts from pup_ro1. Because they might be redundant only seemingly.

This is related to the more general question. How can we consistently manage the whiteout files in the situation when we change the content of the squash files from one boot to another?
I would propose a kind of a compromise solution. Let us do the following:
--1. Never remove a whiteout file .wh.FILE from pup_ro1 just because it
seems redundant. But:
--2. When a new squash is detected on booting, the init script should go
through all the files and directories in the new squash and remove all the
whiteout files in pup_ro1 which would screen any file or directory
in the new squash. Something like this:
find new_squash/ | while read N ; do rm /initrd/pup_ro1/(possible whiteout corresponding to $N) ; done
Notice that this should be done by the init script, not SMP.
Even this would not be totally consistent, but rather a compromise solution. Because suppose that the whiteout was screening some file in some old squash, and then we added the new squash. After the init script removed that whiteout file (because it screens the new squash) you will have the contents of the new squash now mixed with the contents of the old squash, which was previously "deleted" but now will reappear because you removed the whiteout.

Is it possible to deal with the whiteout files consistently, if the content of the squashes changes from boot to boot?
Does this problem have a reasonable solution?
I think most likely we will have to look for a compromise of some sort.
After all, "deleting" a file in the squash is not something we do very often.
The init script currently has the compromise solution that you describe, or rather, as I understand what you are describing. I also haven't figured out what to do about -- or what the consequences will be -- of changing SFS layers.