Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3217

Client hung in cl_io_loop/cl_io_lock path

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.4.0
    • None
    • Cray XE compute node client running with SLES11 SP1 or SP2
    • 2
    • 7850

    Description

      Lustre 2.3.63 client node appeared to be deadlocked and hung causing client node lost of heartbeat. The client OS is SLES11 SP1 or SP2.

      PID: 9665 TASK: ffff880105662100 CPU: 16 COMMAND: "read2_01"
      #0 [ffff880105a2bb38] schedule at ffffffff812db5e5
      #1 [ffff880105a2bbd0] libcfs_debug_msg at ffffffffa017bd81
      #2 [ffff880105a2bc30] cl_lock_trace0 at ffffffffa02d4063
      #3 [ffff880105a2bcd0] cl_lock_mutex_tail at ffffffffa02d43ad
      #4 [ffff880105a2bcf0] cl_lock_mutex_get at ffffffffa02d5ba2
      #5 [ffff880105a2bd20] cl_lock_release at ffffffffa02d6ba1
      #6 [ffff880105a2bd50] cl_lock_link_fini at ffffffffa02ddc52
      #7 [ffff880105a2bd80] cl_io_unlock at ffffffffa02dde25
      #8 [ffff880105a2bdc0] cl_io_loop at ffffffffa02deb55
      #9 [ffff880105a2bdf0] ll_file_io_generic at ffffffffa07a9978
      #10 [ffff880105a2be60] ll_file_aio_write at ffffffffa07a9d61
      #11 [ffff880105a2beb0] ll_file_write at ffffffffa07ab422
      #12 [ffff880105a2bf10] vfs_write at ffffffff81117f3b
      #13 [ffff880105a2bf40] sys_write at ffffffff81118105
      #14 [ffff880105a2bf80] system_call_fastpath at ffffffff8100305b
      RIP: 0000000020013000 RSP: 00007fffffffa368 RFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffffffff8100305b RCX: fefefefefefefeff
      RDX: 0000000000000019 RSI: 00000000400e1ce0 RDI: 0000000000000004
      RBP: 00007fffffffa4f0 R8: 00000000400e1ce0 R9: 6165722f74736574
      R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000020020dc0
      R13: 0000000020020d80 R14: 0000000000000000 R15: 0000000000000004
      ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

      ....

      PID: 9655 TASK: ffff880105bc1820 CPU: 6 COMMAND: "read2_01"
      #0 [ffff880105d63b38] schedule at ffffffff812db5e5
      #1 [ffff880105d63bd0] libcfs_debug_msg at ffffffffa017bd81
      #2 [ffff880105d63c30] our_vma at ffffffffa07e3914
      #3 [ffff880105d63c60] vvp_io_rw_lock at ffffffffa0802282
      #4 [ffff880105d63d30] vvp_io_write_lock at ffffffffa0802636
      #5 [ffff880105d63d40] cl_io_lock at ffffffffa02de535
      #6 [ffff880105d63dc0] cl_io_loop at ffffffffa02deb2a
      #7 [ffff880105d63df0] ll_file_io_generic at ffffffffa07a9978
      #8 [ffff880105d63e60] ll_file_aio_write at ffffffffa07a9d61
      #9 [ffff880105d63eb0] ll_file_write at ffffffffa07ab422
      #10 [ffff880105d63f10] vfs_write at ffffffff81117f3b
      #11 [ffff880105d63f40] sys_write at ffffffff81118105
      #12 [ffff880105d63f80] system_call_fastpath at ffffffff8100305b
      RIP: 0000000020013000 RSP: 00007fffffffa368 RFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffffffff8100305b RCX: fefefefefefefeff
      RDX: 0000000000000019 RSI: 00000000400e1ce0 RDI: 0000000000000004
      RBP: 00007fffffffa4f0 R8: 00000000400e1ce0 R9: 6165722f74736574
      R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000020020dc0
      R13: 0000000020020d80 R14: 0000000000000000 R15: 0000000000000004
      ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b

      a dump is available on:

      ftp.cray.com:/outbound/mas63-sp1-down.tar.bz2

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            wang Wally Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: