Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.4.0
-
None
-
Cray XE compute node client running with SLES11 SP1 or SP2
-
2
-
7850
Description
Lustre 2.3.63 client node appeared to be deadlocked and hung causing client node lost of heartbeat. The client OS is SLES11 SP1 or SP2.
PID: 9665 TASK: ffff880105662100 CPU: 16 COMMAND: "read2_01"
#0 [ffff880105a2bb38] schedule at ffffffff812db5e5
#1 [ffff880105a2bbd0] libcfs_debug_msg at ffffffffa017bd81
#2 [ffff880105a2bc30] cl_lock_trace0 at ffffffffa02d4063
#3 [ffff880105a2bcd0] cl_lock_mutex_tail at ffffffffa02d43ad
#4 [ffff880105a2bcf0] cl_lock_mutex_get at ffffffffa02d5ba2
#5 [ffff880105a2bd20] cl_lock_release at ffffffffa02d6ba1
#6 [ffff880105a2bd50] cl_lock_link_fini at ffffffffa02ddc52
#7 [ffff880105a2bd80] cl_io_unlock at ffffffffa02dde25
#8 [ffff880105a2bdc0] cl_io_loop at ffffffffa02deb55
#9 [ffff880105a2bdf0] ll_file_io_generic at ffffffffa07a9978
#10 [ffff880105a2be60] ll_file_aio_write at ffffffffa07a9d61
#11 [ffff880105a2beb0] ll_file_write at ffffffffa07ab422
#12 [ffff880105a2bf10] vfs_write at ffffffff81117f3b
#13 [ffff880105a2bf40] sys_write at ffffffff81118105
#14 [ffff880105a2bf80] system_call_fastpath at ffffffff8100305b
RIP: 0000000020013000 RSP: 00007fffffffa368 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff8100305b RCX: fefefefefefefeff
RDX: 0000000000000019 RSI: 00000000400e1ce0 RDI: 0000000000000004
RBP: 00007fffffffa4f0 R8: 00000000400e1ce0 R9: 6165722f74736574
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000020020dc0
R13: 0000000020020d80 R14: 0000000000000000 R15: 0000000000000004
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b....
PID: 9655 TASK: ffff880105bc1820 CPU: 6 COMMAND: "read2_01"
#0 [ffff880105d63b38] schedule at ffffffff812db5e5
#1 [ffff880105d63bd0] libcfs_debug_msg at ffffffffa017bd81
#2 [ffff880105d63c30] our_vma at ffffffffa07e3914
#3 [ffff880105d63c60] vvp_io_rw_lock at ffffffffa0802282
#4 [ffff880105d63d30] vvp_io_write_lock at ffffffffa0802636
#5 [ffff880105d63d40] cl_io_lock at ffffffffa02de535
#6 [ffff880105d63dc0] cl_io_loop at ffffffffa02deb2a
#7 [ffff880105d63df0] ll_file_io_generic at ffffffffa07a9978
#8 [ffff880105d63e60] ll_file_aio_write at ffffffffa07a9d61
#9 [ffff880105d63eb0] ll_file_write at ffffffffa07ab422
#10 [ffff880105d63f10] vfs_write at ffffffff81117f3b
#11 [ffff880105d63f40] sys_write at ffffffff81118105
#12 [ffff880105d63f80] system_call_fastpath at ffffffff8100305b
RIP: 0000000020013000 RSP: 00007fffffffa368 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffffffff8100305b RCX: fefefefefefefeff
RDX: 0000000000000019 RSI: 00000000400e1ce0 RDI: 0000000000000004
RBP: 00007fffffffa4f0 R8: 00000000400e1ce0 R9: 6165722f74736574
R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000020020dc0
R13: 0000000020020d80 R14: 0000000000000000 R15: 0000000000000004
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
a dump is available on:
ftp.cray.com:/outbound/mas63-sp1-down.tar.bz2