Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8233

Spurious mounts remaining on client node(s)

Details

    • 3
    • 9223372036854775807

    Description

      Error occurred during soak testing of build '20160512' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160512 )
      DNE is disabled. MDTs had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active - active HA failover configuration.

      After triggering an umount command for the Lustre FS some of the clients don't complete the umount process successful.
      A spurious mount is still present and can be displayed via mount command and /etc/mtab. the FS itself isn't accessible anymore:

      [root@lola-26 ~]# mount
      /dev/sda1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
      10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      192.168.1.108@o2ib10:192.168.1.109@o2ib10:/soaked on /mnt/soaked type lustre (rw,user_xattr)
      [root@lola-26 ~]# cat /etc/mtab
      /dev/sda1 / ext3 rw 0 0
      proc /proc proc rw 0 0
      sysfs /sys sysfs rw 0 0
      devpts /dev/pts devpts rw,gid=5,mode=620 0 0
      tmpfs /dev/shm tmpfs rw 0 0
      none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
      sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
      10.4.0.1:/export/scratch /scratch nfs rw,addr=10.4.0.1 0 0
      10.4.0.1:/home /home nfs rw,addr=10.4.0.1 0 0
      nfsd /proc/fs/nfsd nfsd rw 0 0
      192.168.1.108@o2ib10:192.168.1.109@o2ib10:/soaked /mnt/soaked lustre rw,user_xattr 0 0
      
      [root@lola-26 ~]# ll /mnt/soaked/
      total 0
      

      Executing umount a second time 'clears' the mount status:

      [root@lola-26 ~]# umount /mnt/soaked
      umount: /mnt/soaked: not mounted
      [root@lola-26 ~]# mount
      /dev/sda1 on / type ext3 (rw)
      proc on /proc type proc (rw)
      sysfs on /sys type sysfs (rw)
      devpts on /dev/pts type devpts (rw,gid=5,mode=620)
      tmpfs on /dev/shm type tmpfs (rw)
      none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
      sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
      10.4.0.1:/export/scratch on /scratch type nfs (rw,addr=10.4.0.1)
      10.4.0.1:/home on /home type nfs (rw,addr=10.4.0.1)
      nfsd on /proc/fs/nfsd type nfsd (rw)
      [root@lola-26 ~]# cat /etc/mtab 
      /dev/sda1 / ext3 rw 0 0
      proc /proc proc rw 0 0
      sysfs /sys sysfs rw 0 0
      devpts /dev/pts devpts rw,gid=5,mode=620 0 0
      tmpfs /dev/shm tmpfs rw 0 0
      none /proc/sys/fs/binfmt_misc binfmt_misc rw 0 0
      sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0
      10.4.0.1:/export/scratch /scratch nfs rw,addr=10.4.0.1 0 0
      10.4.0.1:/home /home nfs rw,addr=10.4.0.1 0 0
      nfsd /proc/fs/nfsd nfsd rw 0 0
      

      Attached files console, messages and kernel debug log of affected client node (lola-26

      Attachments

        Activity

          [LU-8233] Spurious mounts remaining on client node(s)

          I couldn't reproduce the error with build '20160601'.

          heckes Frank Heckes (Inactive) added a comment - I couldn't reproduce the error with build '20160601'.

          I also took debug logs on all server nodes. I didn't upload them, because of the big amount of data. Please let me know if they are needed.

          heckes Frank Heckes (Inactive) added a comment - I also took debug logs on all server nodes. I didn't upload them, because of the big amount of data. Please let me know if they are needed.

          Sorry, forgot to attach the log files mentioned above, yesterday.
          The first 'umount' command was started at 'Jun 2 04:26' via pdsh for all client nodes. umounts were blocked on the client nodes ('D').
          I'll try to reproduce the error for the current build '20160601' (https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160601) with command tracing enabled.

          heckes Frank Heckes (Inactive) added a comment - Sorry, forgot to attach the log files mentioned above, yesterday. The first 'umount' command was started at 'Jun 2 04:26' via pdsh for all client nodes. umounts were blocked on the client nodes ('D'). I'll try to reproduce the error for the current build '20160601' ( https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20160601 ) with command tracing enabled.

          If there are problems to debug the source of this problem, it might be possible to replace calls to "umount" with "strace -o /tmp/umount.$$ umount" so that we get a log of why the /etc/mtab update didn't happen.

          adilger Andreas Dilger added a comment - If there are problems to debug the source of this problem, it might be possible to replace calls to "umount" with "strace -o /tmp/umount.$$ umount" so that we get a log of why the /etc/mtab update didn't happen.

          This could be checked by seeing whether the filesystem is only in /etc/mtab and not in /proc/mounts.

          adilger Andreas Dilger added a comment - This could be checked by seeing whether the filesystem is only in /etc/mtab and not in /proc/mounts .
          green Oleg Drokin added a comment -

          so whn this condition arises - was there an error from unmount printed?

          It looks like unmount worked ok, just the /etc/mtab was not updated for some reason.

          green Oleg Drokin added a comment - so whn this condition arises - was there an error from unmount printed? It looks like unmount worked ok, just the /etc/mtab was not updated for some reason.

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: