Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sat 23 Aug 2014, 05:42
All times are UTC - 4
 Forum index » House Training » Bugs ( Submit bugs )
Shutdown hang when CIFS shares are mounted and potential fix
Moderators: Flash, Ian, JohnMurga
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [12 Posts]  
Author Message
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Fri 04 Nov 2011, 06:29    Post subject:  Shutdown hang when CIFS shares are mounted and potential fix  

I've been having problems with many recent Puppy builds I've been using hanging during shutdown/reboot. I finally decided to track it down and found that it was related to rc.shutdown's handling of stray mounted CIFS filesystems. Specifically it was hanging on the final line of this block of code during the end of the shutdown:
Code:
STRAYPARTL="`echo "$MNTDPARTS" |grep -v "/dev/pts" |grep -v "/proc" |grep -v "/sys" |grep -v "tmpfs" |grep -v "rootfs" |grep -v 'on / ' | grep -v "/dev/root" | grep -v "usbfs" | grep -v "unionfs" | grep -v "/initrd"`"
STRAYPARTD="`echo $STRAYPARTL | cut -f 1 -d " " | tr "\n" " "`"
for ONESTRAY in $STRAYPARTD
do
 echo "Unmounting $ONESTRAY..."
 #091117 weird bug, no processes but when run this, x restarts...
 xFUSER="`fuser -m $ONESTRAY 2>/dev/null`" #091117 do this first, seems to fix it.
I'm guessing fuser is hanging because the network was already taken down much earlier in rc.shutdown. I'm not sure why this is occurring lately - the last Puppy I used heavily in the same way was Turbopup, so either this bug has been around since change #091117 or perhaps there is something different about fuser and/or how it works with newer kernels - I've mostly (maybe always) been using dpups when I've seen this.


Here is my proposed fix:
Code:
   # unmount network shares before taking down the network
   for MOUNTPOINT in `mount -l | grep ^// | cut -d  '' -f 3`
   do
      umount -f $MOUNTPOINT
   done


That goes just before the lines in rc.shutdown where the network is taken down:
Code:
   #100301 brought down below call to 'stop' service scripts, needed for lamesmbxplorer.
   #bring down network interfaces (prevents shutdown sometimes)...
   [ "`pidof wpa_supplicant`" != "" ] && wpa_cli terminate #100309 kills any running wpa_supplicant.
   if [ "`grep 'net-setup.sh' /usr/local/bin/defaultconnect`" = "" ];then #see connectwizard and connectwizard_2nd.
      for ONENETIF in `ifconfig | grep -E '^wifi[0-9]|^wlan[0-9]|^eth[0-9]' | cut -f 1 -d ' ' | tr '\n' ' '`
       do
         ifconfig $ONENETIF down 2> /dev/null
          [ "`iwconfig | grep "^${ONENETIF}" | grep "ESSID"`" != "" ] && iwconfig $ONENETIF essid off #100309
          dhcpcd --release $ONENETIF 2>/dev/null #100309
      done
   else
      /etc/rc.d/rc.network stop
   fi
Back to top
View user's profile Send private message 
gcmartin

Joined: 14 Oct 2005
Posts: 4220
Location: Earth

PostPosted: Fri 04 Nov 2011, 23:19    Post subject:  

Thanks. I had been using a script to umount those LAN mounts. Hope this is seen elsewhere.
_________________
Get ACTIVE Create Circles; Do those good things which benefit people's needs!
We are all related ... Its time to show that we know this!
3 Different Puppy Search Engine or use DogPile
Back to top
View user's profile Send private message 
BarryK
Puppy Master


Joined: 09 May 2005
Posts: 7047
Location: Perth, Western Australia

PostPosted: Fri 04 Nov 2011, 23:38    Post subject:  

Idolse,
One problem, you are showing a code snippet from an old version of rc.shutdown.

You should get the rc.shutdown out of the latest Woof, or a Puppy built from recent Woof, such as Racy, Wary or Slacko.

The particular section of code that you have shown now looks like this:

Code:
#091117 110928 if partition mounted, when choose shutdown, pc rebooted. found that param given to fuser must be mount-point, not /dev/*...
STRAYPARTL="`echo "$MNTDPARTS" | grep ' /mnt/' |grep -v -E '/dev/pts|/proc|/sys|tmpfs|rootfs|on / |/dev/root|usbfs|unionfs|aufs|/initrd'`"
STRAYPARTD="`echo "$STRAYPARTL" | cut -f 1 -d ' ' | tr '\n' ' '`"
STRAYMNT="`echo "$STRAYPARTL" | cut -f 3 -d ' ' | tr '\n' ' '`"
for ONESTRAY in $STRAYMNT
do
 #echo "`eval_gettext \"Unmounting \\\${ONESTRAY}...\"`"
 echo "Unmounting $ONESTRAY..."
 xFUSER="`fuser -m $ONESTRAY 2>/dev/null`"
 [ "$xFUSER" != "" ] && fuser -k -m $ONESTRAY 2>/dev/null
 killzombies #v3.99
 sync
 umount -r $ONESTRAY
done


...which might perhaps have solved the problem. It did solve another shutdown problem.

_________________
http://bkhome.org/news/
Back to top
View user's profile Send private message Visit poster's website 
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Sat 05 Nov 2011, 08:57    Post subject:  

I'll give the latest rc.shutdown a shot. I see that the call to "fuser -m" is still in the latest you pasted, and that was the line that was causing the issue - anyway will test with Racy/Wary and confirm again - I see that you're passing a bit different info to fuser.

Hadn't tried with the very latest Woof because for some reason it wasn't building a working dpup for me, and I hadn't got around to figuring out why - I've also been making changes to rc.shutdown and other scripts for my puplet, so each time I sync to a new Woof I've got to manually merge those changes. Will Woof's version control system actually let me branch changes ala Bazaar/git? I haven't tried to test that out so wasn't sure.
Back to top
View user's profile Send private message 
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Sat 05 Nov 2011, 09:22    Post subject:  

Ok, just tested. With the change to rc.shutdown that you pasted it now converts //<hostname/<sharename> to the actual mount point and passes the mountpoint to fuser per the changelog. However it still hangs on fuser.


If I let it sit for several minutes (roughly 5 or so) it seem like fuser eventually gives up and the shutdown continues/completes, which is another reason why I think the network being down is likely the reason fuser fails, and the timeout is just really long.
Back to top
View user's profile Send private message 
gcmartin

Joined: 14 Oct 2005
Posts: 4220
Location: Earth

PostPosted: Sat 05 Nov 2011, 14:54    Post subject:  

That's one of the issues that's encountered at shutdown. Sometimes, either the remote is down or the LAN is down. This requires some force for smoothly shutdown. When everything (LAN) is in place, the shutdown/script processes without issue. But ...

Hmmm.... how best to handle this condition..... (AND if so, could there be a script/menu/network options to force umount of remote/all resources when necessary. This way no matter whether the need arises when running or when in Shutdown, the remote resources could be "un-connected" from the running system.

Hope this helps

_________________
Get ACTIVE Create Circles; Do those good things which benefit people's needs!
We are all related ... Its time to show that we know this!
3 Different Puppy Search Engine or use DogPile
Back to top
View user's profile Send private message 
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Sat 05 Nov 2011, 15:15    Post subject:  

The shutdown script intentionally shuts down the LAN early because on some hardware variants it apparently can cause a hang if the network interfaces aren't explicitly taken down - this behavior has been around forever.

The command calling fuser (which is what hangs) was added toward the end of 2009 to fix an unrelated problem - I don't believe the unmount itself is the issue , I think it's the attempt to gather information about the mountpoint that is down. Puplets using shutdown scripts before this fix don't exhibit the hang - e.g. Turbopup and likely other 4.2 variants.

The fix I proposed is to unmount network shares before taking down the network. This allows both original fixes to operate for their respective purposes and eliminates this behavior which is apparently a regression that's been around for quite some time. An alternate fix would be to exclude any mountpoint starting with '//' from being passed to the fuser command.

I only noticed it because I use network shares pretty religiously, and this is one of the main snags I hit when migrating from Turbopup to the latest and greatest puplets.
Back to top
View user's profile Send private message 
BarryK
Puppy Master


Joined: 09 May 2005
Posts: 7047
Location: Perth, Western Australia

PostPosted: Sun 06 Nov 2011, 04:52    Post subject:  

Ok, I have implemented something, your suggested alternative solution. See if that does the job. Attached.
rc.shutdown.gz
Description 
gz

 Download 
Filename  rc.shutdown.gz 
Filesize  8.76 KB 
Downloaded  334 Time(s) 

_________________
http://bkhome.org/news/
Back to top
View user's profile Send private message Visit poster's website 
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Sun 06 Nov 2011, 09:10    Post subject:  

Thanks, I gave it a shot and I've got good news and bad news.

The good news is the update allows shutdown to get past the fuser command, so that particular change works as expected.

The bad news is it still hangs, it just hangs later on the final command in the shutdown script:
Code:
busybox umount -ar > /dev/null 2>&1


That command hasn't changed at all since the 4.2 days, but I believe busybox has changed several times. The dpup I'm using is running is busybox version 1.17.2. It hangs for 5 minutes again, just like the hang with fuser. Not sure if Busybox can be called in a different way or if unmounting the network shares before taking down the network is the only way to get rid of the hang. I tried adding -f and -l just to see if it helped, no dice.
Back to top
View user's profile Send private message 
ldolse

Joined: 23 Oct 2009
Posts: 366

PostPosted: Sun 06 Nov 2011, 09:27    Post subject:  

I take it back about the problem not existing in 4.2 puppies, but it's not nearly as pronounced/obvious - I just re-tested with Turbopup to get better idea of what happens, as the older code will execute a umount for each ONESTRAY item without running fuser - so in those older scripts the share would have been unmounted before getting to the final busybox umount command.

It turns out the hang actually seems to be present when the umount command is executed here as well, but it's not as obvious because the network timeout is much lower, maybe only around 1 minute (edit,probably only 30 seconds, just re-tested). Not sure where the timer is defined, if that's a lower level kernel function or what.
Back to top
View user's profile Send private message 
BarryK
Puppy Master


Joined: 09 May 2005
Posts: 7047
Location: Perth, Western Australia

PostPosted: Mon 07 Nov 2011, 18:11    Post subject:  

ldolse wrote:
Thanks, I gave it a shot and I've got good news and bad news.

The good news is the update allows shutdown to get past the fuser command, so that particular change works as expected.

The bad news is it still hangs, it just hangs later on the final command in the shutdown script:
Code:
busybox umount -ar > /dev/null 2>&1


That command hasn't changed at all since the 4.2 days, but I believe busybox has changed several times. The dpup I'm using is running is busybox version 1.17.2. It hangs for 5 minutes again, just like the hang with fuser. Not sure if Busybox can be called in a different way or if unmounting the network shares before taking down the network is the only way to get rid of the hang. I tried adding -f and -l just to see if it helped, no dice.


Ok, I have put in your original solution. See attached.
rc.shutdown.gz
Description 
gz

 Download 
Filename  rc.shutdown.gz 
Filesize  8.86 KB 
Downloaded  331 Time(s) 

_________________
http://bkhome.org/news/
Back to top
View user's profile Send private message Visit poster's website 
Karl Godt


Joined: 20 Jun 2010
Posts: 3964
Location: Kiel,Germany

PostPosted: Tue 27 Dec 2011, 07:09    Post subject:  

Quote:
--- /root/my-documents/tmp/rc.shutdown1 2011-12-27 11:43:00.000000000 +0100
+++ /root/my-documents/tmp/rc.shutdown2 2011-12-27 11:43:36.000000000 +0100
@@ -57,6 +57,7 @@
#110928 fixed, reboots when choose shutdown. very old bug, dates back to 2009.
#110928 modified i18n conversion, only for echo to /dev/console.
#111106 do not execute fuser if network share mount.
+#111107 ldolse: unmount network shares before taking down the network

#110923
. /usr/bin/gettext.sh # enables use of eval_gettext (several named variables) and ngettext (plurals)
@@ -180,6 +181,13 @@ if [ "$ACTIVE_INTERFACE" ];then
fi
fi

+#111107 ldolse: unmount network shares before taking down the network
+#(see 111106, need to do it sooner, but 111106 will remount read-only if failed to umount here)
+for MOUNTPOINT in `mount | grep '^//' | cut -d ' ' -f 3 | tr '\n' ' '`
+do
+ umount -f $MOUNTPOINT
+done
+
#v2.16 some packages have a service script that requires stopping...
for service_script in /etc/init.d/*
do

The above is the diff of both rc.shutdown .
I cannot say anything about cifs and network share .

#

BUT

rc.shutdown has got a new bug in the STRAYPARTSLIST :

due to
Code:
STRAYPARTandMNT="`echo "$STRAYPARTL" | cut -f 1,3 -d ' ' | tr ' ' '|' | tr '\n' ' '`"

the list would look like
"
/dev/sda1|/mnt/sda1
/dev/sda2|/mnt/sda2
/dev/sda3|/mnt/sda3"


The code goes further with
Code:
for ONESTRAY in $STRAYPARTandMNT
do
 FLAGCIFS="`echo -n ${ONESTRAY} | grep '^//'`"
 ONESTRAYMNT="`echo -n ${ONESTRAY} | cut -f 2 -d '|'`"
 #echo "`eval_gettext \"Unmounting \\\${ONESTRAY}...\"`"
 echo "Unmounting $ONESTRAY..."
 if [ "$FLAGCIFS" = "" ];then
  xFUSER="`fuser -m $ONESTRAY 2>/dev/null`"
  [ "$xFUSER" != "" ] && fuser -k -m $ONESTRAYMNT 2>/dev/null
 fi
 killzombies #v3.99
 sync

AND
Code:
 umount -r $ONESTRAY
done

THE problem is ONESTRAY becoming "/dev/sda1|/mnt/sda1"
which i am not used to know to work .

ONESTRAY should be either

"/dev/sda1" OR
"/mnt/sda1" OR probably
"/dev/sda1 /mnt/sda1" #not tested this third possibility

NOT /dev/sda1|/mnt/sda1 !!!

The problem is the delimiter becoming '|' staff , not '[[:space:]] ' .

I don't think that the directory or file dev/sda1|/mnt/sda1 exist in the /dev/ directory .

[UNDER CONSTRUCTION]
[after cleaned up i think i will provide a correct diff in some time]

SOLUTION :
#diff -up /mnt/+JUMP-10+puppy_slacko_5.3.1.sfs/etc/rc.d/rc.shutdown /etc/rc.d/rc.shutdown

Code:
--- /mnt/+JUMP-10+puppy_slacko_5.3.1.sfs/etc/rc.d/rc.shutdown   2011-12-10 08:06:11.000000000 +0100
+++ /etc/rc.d/rc.shutdown   2011-12-26 22:39:32.000000000 +0100
@@ -523,14 +523,14 @@ do
  FLAGCIFS="`echo -n ${ONESTRAY} | grep '^//'`"
  ONESTRAYMNT="`echo -n ${ONESTRAY} | cut -f 2 -d '|'`"
  #echo "`eval_gettext \"Unmounting \\\${ONESTRAY}...\"`"
- echo "Unmounting $ONESTRAY..."
+ echo "Unmounting $ONESTRAY..." >/dev/console
  if [ "$FLAGCIFS" = "" ];then
   xFUSER="`fuser -m $ONESTRAY 2>/dev/null`"
   [ "$xFUSER" != "" ] && fuser -k -m $ONESTRAYMNT 2>/dev/null
  fi
  killzombies #v3.99
  sync
- umount -r $ONESTRAY
+ umount -r $ONESTRAYMNT
 done
 
 swapoff -a #works only if swaps are in mtab or ftab
@@ -539,7 +539,7 @@ STRAYPARTD="`cat /proc/swaps | grep "/de
 for ONESTRAY in $STRAYPARTD
 do
  #echo "`eval_gettext \"Swapoff \\\${ONESTRAY}\"`"
- echo "Swapoff $ONESTRAY"
+ echo "Swapoff $ONESTRAY" >/dev/console
  swapoff $ONESTRAY
 done
 sync


NOTE : THE IMPORTANT PART is

- umount -r $ONESTRAY
+ umount -r $ONESTRAYMNT

[edit]
there is still
xFUSER="`fuser -m $ONESTRAY 2>/dev/null`"
should also become
xFUSER="`fuser -m $ONESTRAYMNT 2>/dev/null`"

I also altered
ZOMBIES="`ps -H -A | grep '<defunct>' | sed -e 's/ /|/g' | grep -v '|||' | cut -f 1 -d ' ' | tr '\n' ' '`"
TO
Code:
 ZOMBIES="`ps -H -A | grep '<defunct>' | sed 's/^[[:blank:]]*//g' | cut -f 1 -d ' ' | tr '\n' ' '`"

[edit2]
Code:

 ZOMBIES="`ps -H -A | grep '<defunct>' | sed 's/^[[:blank:]]*//g' | cut -f 1 -d ' ' | sort -gr | tr '\n' ' '`"

which would kill all zombies with the highest pids first
[/edit2]
because i was getting a bunch of "kill: : arguments must be process or job ID" on the screen by the killzombies function

[edit2]
AND in the
the ABSPUPHOME part !!

ABSPUPHOME=""
if [ "`busybox mount | grep "$ABSPUPHOME"`" != "" ];then


would always grep everything left mounted like /proc AND /sys .

AND
BADPIDS="`fuser -m $ABSPUPHOME 2>/dev/null`"
would be something like
BADPIDS="`fuser -m '' 2>/dev/null`"
AND because of directing the error output of "fuser" [--help] to /dev/null
this would be ok ,
BUT
also the killzombies function would run again .
[/edit2]

[/edit]

[edit2]
Ok : killzombies function wants only to grep parent-less zombies .

First minor problem :
busybox init --help
Init is the parent of all processes

If i assume that the toplevel parents are not meant like busybox init OR
"||2 ?||||00:00:00 kthreadd"
BUT
the second level parents
like
"|165 ?||||00:00:00| udevd"

these would have not been killed by filtering through grep -v '|||' .

HERE

" 3102 tty1|| 00:00:00| xwin"
" 3236 tty1|| 00:00:00|| xinit"
" 3237 tty4|| 00:01:53||| X"
" 3288 tty1|| 00:00:01||| jwm"
" 3362 tty1|| 00:00:15|||| pup_event_front"
"18890 tty1|| 00:00:00||||| sleep"
" 3353 tty1|| 00:00:00|||| jwm <defunct>"

only xwin and xinit would have been grep'd .

cut -f 1 -d ' ' would provide an empty space " " instead of "3102" OR "3236" .

I have other pids that would become like

"||5 ?||||00:00:00| kworker/u:0"
"| 11 ?||||00:00:00| khelper"
"|165 ?||||00:00:00| udevd"

Without looking for the '||||' which would've been filtered by grep -v '|||'
cut -f 1 -d " " would assign "||5" " " "|165" to the list of ZOMBIES to be killed .

Here the output of the kill command for '||5' :

kill '||5'
bash: kill: ||5: arguments must be process or job IDs

Now i've tinkered around with the ps output which is somewhat unusable like cat -n : nice for the eye but disgusting for usage in shell-scripts :
Code:
ZOMBIES="`ps -H -A |sed 's/^[[:blank:]]*//g;s/\([0-9]*\)\ \([[:alnum:][:punct:]]*\)\([[:blank:]]*\)\(.*\)/\1 \2 \4/g' | grep '<defunct>' | sed -e 's/  /|/g' | grep -v '|' | cut -f 1 -d ' ' | tr '\n' ' '`"

My explanation :
ps -A -H for
-A all processes including the ? tty ie session leaders
-H process hierarchy
sed 's/^[[:blank:]]*//g
because using cut -f 1 -d ' ' later instead of awk '{print $1}' would probably not grep a pid but a white space
s/\([0-9]*\)\ \([[:alnum:][:punct:]]*\)\([[:blank:]]*\)\(.*\)/\1 \2 \4/g'
should translate the formatted spaces between
19973 pts/6 00:00:00 ps-FULL
3236 tty1 00:00:00 xinit
into
22305 pts/6 00:00:00 ps-FULL
3236 tty1 00:00:00 xinit
by ignoring the group 3 \([[:blank:]]\) leaving everything behind this group unformatted (group 4) .
The hierarchy output uses two spaces to show the hierarchy stairs .
These two spaces would now edited by sed -e 's/ /|/g' like in the original .
Now every line not containing a staff would become a parent :
"3236 tty1 00:00:00 xinit"
"3288 tty1 00:00:00| jwm"
"3362 tty1 00:00:07|| pup_event_front"
"24494 tty1 00:00:00||| sleep"
"2581 tty1 00:00:00|| jwm <defunct>"

In the above output "jwm <defunct>" would have not been killed because the two staffs '|| ' would indicate two parents .
[/edit2]

[/UNDER CONSTRUCTION]
Back to top
View user's profile Send private message Visit poster's website 
Display posts from previous:   Sort by:   
Page 1 of 1 [12 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » House Training » Bugs ( Submit bugs )
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.1172s ][ Queries: 12 (0.0054s) ][ GZIP on ]