Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0, Lustre 2.7.0
-
3
-
14750
Description
Running racer with 2 clients MDSCOUNT=1 and 2.5.60-90-g37432a8 + http://review.whamcloud.com/#/c/5936/ I see this when restarting a crashed OST with some clients still mounted.
[ 230.089707] Lustre: Skipped 75 previous similar messages [ 231.775205] Lustre: 2151:0:(client.c:1924:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1404323793/real 1404323793] req@ffff8801f78fc110 x1472540086110788/t0(0) o400->lustre-OST0001-osc-MDT0000@0@lo:28/4 lens 224/224 e 1 to 1 dl 1404323837 ref 1 fl Rpc:X/c0/ffffffff rc 0/-1 [ 237.775938] Lustre: lustre-OST0001: Denying connection for new client cc64d6dc-4180-e700-9f7e-ce147524a8f0 (at 0@lo), waiting for all 4 known clients (2 recovered, 1 in progress, and 1 evicted) to recover in 0:36 [ 237.781858] Lustre: Skipped 3 previous similar messages [ 242.801254] LustreError: 2880:0:(ldlm_lib.c:2253:target_queue_recovery_request()) ASSERTION( req->rq_export->exp_lock_replay_needed ) failed: [ 242.805102] LustreError: 2880:0:(ldlm_lib.c:2253:target_queue_recovery_request()) LBUG [ 242.807953] Pid: 2880, comm: ll_ost00_007 [ 242.809274] [ 242.809276] Call Trace: [ 242.810585] [<ffffffffa02b98c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 242.812764] [<ffffffffa02b9ec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 242.814689] [<ffffffffa064ea0c>] target_queue_recovery_request+0xbac/0xc10 [ptlrpc] [ 242.816347] [<ffffffffa06e122f>] tgt_handle_recovery+0x38f/0x520 [ptlrpc] [ 242.817666] [<ffffffffa06e6b8d>] tgt_request_handle+0x18d/0xad0 [ptlrpc] [ 242.818987] [<ffffffffa0699e31>] ptlrpc_main+0xcf1/0x1880 [ptlrpc] [ 242.820261] [<ffffffffa0699140>] ? ptlrpc_main+0x0/0x1880 [ptlrpc] [ 242.821440] [<ffffffff8109eab6>] kthread+0x96/0xa0 [ 242.822360] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 242.823303] [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40 [ 242.824390] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 242.825391] [<ffffffff8109ea20>] ? kthread+0x0/0xa0 [ 242.826315] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 242.827283]