Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.8.0
-
None
-
Hyperion/SWL
-
3
-
9223372036854775807
Description
Running SWL clients timeout and LBUG after server (OST) failover
Aug 31 11:22:37 iwc98 kernel: Lustre: 5465:0:(client.c:2014:ptlrpc_expire_one_request()) Skipped 8 previous similar messages Aug 31 11:23:04 iwc98 kernel: Lustre: lustre-OST000a-osc-ffff8808713e3c00: Connection restored to lustre-OST000a (at 192.168.120.24@o2ib) Aug 31 11:33:32 iwc98 kernel: Lustre: 5457:0:(client.c:2014:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1441045957/real 1441045957] req@ffff8806e1ec2080 x1510687803291588/t0(0) o8->stzfs-OST000a-osc-ffff880871117c00@192.168.120.23@o2ib:28/4 lens 520/544 e 0 to 1 dl 1441046012 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 Aug 31 11:33:32 iwc98 kernel: Lustre: 5457:0:(client.c:2014:ptlrpc_expire_one_request()) Skipped 32 previous similar messages Aug 31 11:34:55 iwc98 kernel: LustreError: 11-0: lustre-OST000a-osc-ffff8808713e3c00: operation ost_write to node 192.168.120.24@o2ib failed: rc = -107 Aug 31 11:34:55 iwc98 kernel: Lustre: lustre-OST000a-osc-ffff8808713e3c00: Connection to lustre-OST000a (at 192.168.120.24@o2ib) was lost; in progress operations using this service will wait for recovery to complete Aug 31 11:34:55 iwc98 kernel: Lustre: Skipped 2 previous similar messages Aug 31 11:35:19 iwc98 kernel: LustreError: 167-0: lustre-OST000a-osc-ffff8808713e3c00: This client was evicted by lustre-OST000a; in progress operations using this service will fail. Aug 31 11:35:22 iwc98 kernel: LustreError: 107401:0:(vvp_io.c:1076:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: Aug 31 11:35:22 iwc98 kernel: LustreError: 107401:0:(vvp_io.c:1076:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: Aug 31 11:35:22 iwc98 kernel: LustreError: 107399:0:(vvp_io.c:1076:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: Aug 31 11:35:22 iwc98 kernel: LustreError: 107399:0:(vvp_io.c:1076:vvp_io_write_start()) ASSERTION( vio->vui_iocb->ki_pos == pos ) failed: Aug 31 11:35:22 iwc98 kernel: LustreError: 107399:0:(vvp_io.c:1076:vvp_io_write_start()) LBUG Aug 31 11:35:22 iwc98 kernel: LustreError: 107399:0:(vvp_io.c:1076:vvp_io_write_start()) LBUG Aug 31 11:35:22 iwc98 kernel: Pid: 107399, comm: ior Aug 31 11:35:22 iwc98 kernel: Aug 31 11:35:22 iwc98 kernel: Call Trace: Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0470875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0470e77>] lbug_with_loc+0x47/0xb0 [libcfs] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0a5f27c>] vvp_io_write_start+0x5bc/0x5c0 [lustre] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa047cbf1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa060444a>] cl_io_start+0x6a/0x140 [obdclass] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0608994>] cl_io_loop+0xb4/0x1b0 [obdclass] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0a077b7>] ll_file_io_generic+0x317/0xab0 [lustre] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0a09c9b>] ll_file_aio_write+0x20b/0x860 [lustre] Aug 31 11:35:22 iwc98 kernel: [<ffffffffa0a0a41b>] ll_file_write+0x12b/0x260 [lustre] Aug 31 11:35:22 iwc98 kernel: [<ffffffff8118e3a8>] vfs_write+0xb8/0x1a0 Aug 31 11:35:22 iwc98 kernel: [<ffffffff8118ed71>] sys_write+0x51/0x90 Aug 31 11:35:22 iwc98 kernel: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b