[LU-8233] Spurious mounts remaining on client node(s) Created: 02/Jun/16  Updated: 13/Oct/21  Resolved: 13/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Frank Heckes (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: soak
Environment:

lola
build: https://build.hpdd.intel.com/job/lustre-master/3365/ (el6.7, x86_64)


Attachments: File console-lola-26.log.bz2     File lola-26-lustre-log-20160602-0504.bz2     File messages-lola-26.log.bz2    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Error occurred during soak testing of build '20160512' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160512 )
DNE is disabled. MDTs had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active - active HA failover configuration.

After triggering an umount command for the Lustre FS some of the clients don't complete the umount process successful.
A spurious mount is still present and can be displayed via mount command and /etc/mtab. the FS itself isn't accessible anymore:

[root@lola-26 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
192.168.1.108@o2ib10:192.168.1.109@o2ib10:/soaked on /mnt/soaked type lustre (rw,user_xattr)
[root@lola-26 ~]# cat /etc/mtab
/dev/sda1 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
10.4.0.1:/export/scratch /scratch nfs rw,addr=10.4.0.1 0 0
10.4.0.1:/home /home nfs rw,addr=10.4.0.1 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0
192.168.1.108@o2ib10:192.168.1.109@o2ib10:/soaked /mnt/soaked lustre rw,user_xattr 0 0

[root@lola-26 ~]# ll /mnt/soaked/
total 0

Executing umount a second time 'clears' the mount status:

[root@lola-26 ~]# umount /mnt/soaked
umount: /mnt/soaked: not mounted
[root@lola-26 ~]# mount
/dev/sda1 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
nfsd on /proc/fs/nfsd type nfsd (rw)
[root@lola-26 ~]# cat /etc/mtab 
/dev/sda1 / ext3 rw 0 0
proc /proc proc rw 0 0
sysfs /sys sysfs rw 0 0
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
tmpfs /dev/shm tmpfs rw 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
10.4.0.1:/export/scratch /scratch nfs rw,addr=10.4.0.1 0 0
10.4.0.1:/home /home nfs rw,addr=10.4.0.1 0 0
nfsd /proc/fs/nfsd nfsd rw 0 0

Attached files console, messages and kernel debug log of affected client node (lola-26



 Comments   
Comment by Oleg Drokin [ 02/Jun/16 ]

so whn this condition arises - was there an error from unmount printed?

It looks like unmount worked ok, just the /etc/mtab was not updated for some reason.

Comment by Andreas Dilger [ 02/Jun/16 ]

This could be checked by seeing whether the filesystem is only in /etc/mtab and not in /proc/mounts.

Comment by Andreas Dilger [ 02/Jun/16 ]

If there are problems to debug the source of this problem, it might be possible to replace calls to "umount" with "strace -o /tmp/umount.$$ umount" so that we get a log of why the /etc/mtab update didn't happen.

Comment by Frank Heckes (Inactive) [ 03/Jun/16 ]

Sorry, forgot to attach the log files mentioned above, yesterday.
The first 'umount' command was started at 'Jun 2 04:26' via pdsh for all client nodes. umounts were blocked on the client nodes ('D').
I'll try to reproduce the error for the current build '20160601' (https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160601) with command tracing enabled.

Comment by Frank Heckes (Inactive) [ 03/Jun/16 ]

I also took debug logs on all server nodes. I didn't upload them, because of the big amount of data. Please let me know if they are needed.

Comment by Frank Heckes (Inactive) [ 03/Jun/16 ]

I couldn't reproduce the error with build '20160601'.

Generated at Sat Feb 10 02:15:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.