Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10851

parallel-scale-nfsv4 hangs on unmount

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.11.0, Lustre 2.12.0, Lustre 2.10.3, Lustre 2.10.4, Lustre 2.10.5, Lustre 2.13.0, Lustre 2.10.6, Lustre 2.10.7, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.5, Lustre 2.12.6, Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      parallel-scale-nfsv4 hangs on unmount after all tests have run. In the suite_log, the last thing we see is

      == parallel-scale-nfsv4 test complete, duration 2088 sec ============================================= 22:07:24 (1521868044)
       
      Unmounting NFS clients...
      CMD: trevis-8vm1,trevis-8vm2 umount -f /mnt/lustre
       
      Unexporting Lustre filesystem...
      CMD: trevis-8vm1,trevis-8vm2 chkconfig --list rpcidmapd 2>/dev/null |
                                          grep -q rpcidmapd && service rpcidmapd stop ||
                                          true
      CMD: trevis-8vm4 { [[ -e /etc/SuSE-release ]] &&
                                             service nfsserver stop; } ||
                                             service nfs stop
      CMD: trevis-8vm4 sed -i '/^lustre/d' /etc/exports
      CMD: trevis-8vm4 exportfs -v
      CMD: trevis-8vm4 grep -c /mnt/lustre' ' /proc/mounts
      Stopping client trevis-8vm4 /mnt/lustre (opts:-f)
      CMD: trevis-8vm4 lsof -t /mnt/lustre
      CMD: trevis-8vm4 umount -f /mnt/lustre 2>&1 
      

      Looking at the console logs for vm4, MDS1 and 3, we see

      [ 2216.385890] Lustre: DEBUG MARKER: == parallel-scale-nfsv4 test complete, duration 2088 sec ============================================= 22:07:24 (1521868044)
      [ 2216.698201] Lustre: DEBUG MARKER: { [[ -e /etc/SuSE-release ]] &&
      [ 2216.698201]                                service nfsserver stop; } ||
      [ 2216.698201]                                service nfs stop
      [ 2216.805093] nfsd: last server has exited, flushing export cache
      [ 2216.819487] Lustre: DEBUG MARKER: sed -i '/^lustre/d' /etc/exports
      [ 2216.885266] Lustre: DEBUG MARKER: exportfs -v
      [ 2216.945098] Lustre: DEBUG MARKER: grep -c /mnt/lustre' ' /proc/mounts
      [ 2216.982526] Lustre: DEBUG MARKER: lsof -t /mnt/lustre
      [ 2217.170422] Lustre: DEBUG MARKER: umount -f /mnt/lustre 2>&1
      [ 2217.192827] Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
      [ 2217.193476] LustreError: 410:0:(file.c:205:ll_close_inode_openhandle()) lustre-clilmv-ffff880060b4e800: inode [0x200000406:0x3c1b:0x0] mdc close failed: rc = -108
      [ 2217.218709] Lustre: 4066:0:(llite_lib.c:2676:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.4.84@tcp:/lustre/fid: [0x200000406:0x3e42:0x0]/ may get corrupted (rc -108)
      [ 2217.218732] Lustre: 4066:0:(llite_lib.c:2676:ll_dirty_page_discard_warn()) lustre: dirty page discard: 10.9.4.84@tcp:/lustre/fid: [0x200000406:0x3e7b:0x0]/ may get corrupted (rc -108)
      …
      [ 5541.474664]
      [ 5541.474667] umount          D 0000000000000000     0   410    409 0x00000000
      [ 5541.474669]  ffff88004365fda8 ffff88004365fde0 ffff880048e5ce00 ffff880043660000
      [ 5541.474670]  ffff88004365fde0 000000010013feb9 ffff88007fc10840 0000000000000000
      [ 5541.474671]  ffff88004365fdc0 ffffffff81612a95 ffff88007fc10840 ffff88004365fe68
      [ 5541.474672] Call Trace:
      [ 5541.474674]  [<ffffffff81612a95>] schedule+0x35/0x80
      [ 5541.474677]  [<ffffffff81615851>] schedule_timeout+0x161/0x2d0
      [ 5541.474689]  [<ffffffffa1457cc7>] ll_kill_super+0x77/0x150 [lustre]
      [ 5541.474723]  [<ffffffffa09f3a94>] lustre_kill_super+0x34/0x40 [obdclass]
      [ 5541.474734]  [<ffffffff8120cf5f>] deactivate_locked_super+0x3f/0x70
      [ 5541.474742]  [<ffffffff812283fb>] cleanup_mnt+0x3b/0x80
      [ 5541.474745]  [<ffffffff8109d198>] task_work_run+0x78/0x90
      [ 5541.474748]  [<ffffffff8107b5cf>] exit_to_usermode_loop+0x91/0xc2
      [ 5541.474760]  [<ffffffff81003ae5>] syscall_return_slowpath+0x85/0xa0
      [ 5541.474768]  [<ffffffff81616ca7>] int_ret_from_sys_call+0x25/0x9f
      [ 5541.476903] DWARF2 unwinder stuck at int_ret_from_sys_call+0x25/0x9f
      [ 5541.476904] 
      

      We see this problem with unmount on the maser and b2_10 branches for SLES12 SP2 and SP3 testing only.

      Logs for test suites failres are at

      https://testing.whamcloud.com/test_sets/4bce5a66-2f2f-11e8-9e0e-52540065bddc 

      https://testing.whamcloud.com/test_sets/103f280e-2fac-11e8-b3c6-52540065bddc

      https://testing.whamcloud.com/test_sets/044a75f0-2eba-11e8-b6a0-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              qian_wc Qian Yingjin
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: