Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3896

racer test hung: /mnt/lustre2 is still busy, wait one second

    XMLWordPrintable

Details

    • 3
    • 10186

    Description

      racer test hung as follows:

      == racer test complete, duration 1006 sec == 00:42:55 (1378194175)
      CMD: wtm-29vm3 /usr/sbin/lctl get_param -n version
      CMD: wtm-29vm3 /usr/sbin/lctl conf_param lustre.quota.mdt=ug
      CMD: wtm-29vm3 /usr/sbin/lctl conf_param lustre.quota.ost=ug
      Stopping clients: wtm-29vm5,wtm-29vm6.rosso.whamcloud.com /mnt/lustre2 (opts:)
      CMD: wtm-29vm5,wtm-29vm6.rosso.whamcloud.com running=\$(grep -c /mnt/lustre2' ' /proc/mounts);
      if [ \$running -ne 0 ] ; then
      echo Stopping client \$(hostname) /mnt/lustre2 opts:;
      lsof /mnt/lustre2 || need_kill=no;
      if [ x != x -a x\$need_kill != xno ]; then
          pids=\$(lsof -t /mnt/lustre2 | sort -u);
          if [ -n \"\$pids\" ]; then
                   kill -9 \$pids;
          fi
      fi;
      while umount  /mnt/lustre2 2>&1 | grep -q busy; do
          echo /mnt/lustre2 is still busy, wait one second && sleep 1;
      done;
      fi
      Stopping client wtm-29vm6.rosso.whamcloud.com /mnt/lustre2 opts:
      Stopping client wtm-29vm5.rosso.whamcloud.com /mnt/lustre2 opts:
      COMMAND   PID USER   FD   TYPE      DEVICE SIZE/OFF               NODE NAME
      dd       3713 root    1w   REG 1273,181606  7769088 450359979933698183 /mnt/lustre/racer1/6 (deleted)
      dd      10024 root    1w   REG 1273,181606 99767296 450359979950473219 /mnt/lustre2/racer1/2 (deleted)
      dd      10031 root    1w   REG 1273,181606  9411584 450359979967250450 /mnt/lustre2/racer1/12
      dd      10059 root    1w   REG 1273,181606 85909504 432345581440991292 /mnt/lustre2/racer/6
      dd      10062 root    1w   REG 1273,181606 56402944 432345581440991267 /mnt/lustre/racer/8
      dd      10074 root    1w   REG 1273,181606 47879168 432345581457768455 /mnt/lustre/racer/9
      dd      10076 root    1w   REG 1273,181606 45958144 432345581407436811 /mnt/lustre2/racer/13
      dd      10084 root    1w   REG 1273,181606 47879168 432345581457768455 /mnt/lustre/racer/9
      dd      10130 root    1w   REG 1273,181606 18804736 468374378443178004 /mnt/lustre/racer2/13
      dd      10173 root    1w   REG 1273,181606 55358464 486388776969437193 /mnt/lustre/racer3/6
      dd      10421 root    1w   REG 1273,181606  2441216 486388776952660004 /mnt/lustre/racer3/10
      dd      10450 root    1w   REG 1273,181606 50086912 468374378476732457 /mnt/lustre/racer2/0
      dd      10456 root    1w   REG 1273,181606 31876096 468374378443178023 /mnt/lustre/racer2/13 (deleted)
      dd      10457 root    1w   REG 1273,181606 64463872 486388776952660005 /mnt/lustre/racer3/19
      dd      10527 root    1w   REG 1273,181606  2441216 486388776952660004 /mnt/lustre2/racer3/10
      dd      10581 root    1w   REG 1273,181606  7365632 468374378459955230 /mnt/lustre2/racer2/11
      dd      11360 root    1w   REG 1273,181606 57447424 450359979967250475 /mnt/lustre2/racer1/10
      cat     11906 root    1w   REG 1273,181606 10503168 486388776935882790 /mnt/lustre2/racer3/2 (deleted)
      cat     11906 root    3r   REG 1273,181606 39769088 486388776986214418 /mnt/lustre2/racer3/5
      dd      11947 root    1w   REG 1273,181606 10503168 486388776935882790 /mnt/lustre2/racer3/2 (deleted)
      cat     12675 root    1w   REG 1273,181606  8477696 450359979933696110 /mnt/lustre/racer1/1 (deleted)
      cat     12675 root    3r   REG 1273,181606  8478720 450359979933696048 /mnt/lustre/racer1/5
      dd      15141 root    1w   REG 1273,181606 58491904 468374378476732558 /mnt/lustre2/racer2/3 (deleted)
      dd      21547 root    1w   REG 1273,181606 32379904 468374378476734932 /mnt/lustre2/racer2/5
      dd      29365 root    1w   REG 1273,181606 61625344 450359979916919106 /mnt/lustre/racer1/0
      /mnt/lustre2 is still busy, wait one second
      

      Console log on client wtm-29vm6 showed that:

      00:42:04:cat           D 0000000000000000     0  4282   2714 0x00000080
      00:42:04: ffff880062aefc90 0000000000000082 0000000000000246 000000100000000d
      00:42:04: ffffffff00000050 ffffffffa06b3390 ffff88006c91e1e8 ffff88006c91ca28
      00:42:04: ffff880065a2f058 ffff880062aeffd8 000000000000fb88 ffff880065a2f058
      00:42:04:Call Trace:
      00:42:04: [<ffffffffa06b3390>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
      00:42:04: [<ffffffffa09a9fa0>] ? lov_io_init_raid0+0x3e0/0x800 [lov]
      00:42:04: [<ffffffff81510695>] rwsem_down_failed_common+0x95/0x1d0
      00:42:04: [<ffffffff81510826>] rwsem_down_read_failed+0x26/0x30
      00:42:04: [<ffffffff81283844>] call_rwsem_down_read_failed+0x14/0x30
      00:42:04: [<ffffffff8150fd24>] ? down_read+0x24/0x30
      00:42:04: [<ffffffffa0a7182d>] ll_file_io_generic+0x29d/0x600 [lustre]
      00:42:04: [<ffffffffa0a71ccf>] ll_file_aio_read+0x13f/0x2c0 [lustre]
      00:42:04: [<ffffffffa0a7256c>] ll_file_read+0x16c/0x2a0 [lustre]
      00:42:04: [<ffffffff81181a95>] vfs_read+0xb5/0x1a0
      00:42:04: [<ffffffff81181bd1>] sys_read+0x51/0x90
      00:42:04: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
      00:42:04: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Stack trace on client wtm-29vm5 showed that:

      dd            S 0000000000000000     0  3713      1 0x00000080
       ffff880042549bd8 0000000000000086 ffffffffa0581e5c ffff8800679bf5b8
       0000000000000002 ffff880067ac4040 ffff880042549be8 ffffffffa055d35c
       ffff880067ac45f8 ffff880042549fd8 000000000000fb88 ffff880067ac45f8
      Call Trace:
       [<ffffffffa055d35c>] ? cl_lock_trace0+0x11c/0x130 [obdclass]
       [<ffffffffa0554825>] ? cl_env_info+0x15/0x20 [obdclass]
       [<ffffffffa03d86fe>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa056171a>] cl_lock_state_wait+0x1aa/0x320 [obdclass]
       [<ffffffff81063410>] ? default_wake_function+0x0/0x20
       [<ffffffffa0561f0b>] cl_enqueue_locked+0x15b/0x1f0 [obdclass]
       [<ffffffffa0562a8e>] cl_lock_request+0x7e/0x270 [obdclass]
       [<ffffffffa0567d2c>] cl_io_lock+0x3cc/0x560 [obdclass]
       [<ffffffffa0567f62>] cl_io_loop+0xa2/0x1b0 [obdclass]
       [<ffffffffa09df9e0>] ll_file_io_generic+0x450/0x600 [lustre]
       [<ffffffffa09e0922>] ll_file_aio_write+0x142/0x2c0 [lustre]
       [<ffffffffa09e0c0c>] ll_file_write+0x16c/0x2a0 [lustre]
       [<ffffffff81181368>] vfs_write+0xb8/0x1a0
       [<ffffffff81181c61>] sys_write+0x51/0x90
       [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
       [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      

      Maloo report: https://maloo.whamcloud.com/test_sets/66e0f40c-1649-11e3-aa2a-52540035b04c

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: