Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.10.0
-
Soak cluster version=lustre: 2.9.52_73_gb5c4f03
-
3
-
9223372036854775807
Description
Soak is running, performing successful OSS failover, partitions are being recovered:
2017-03-30 00:47:10,761:fsmgmt.fsmgmt:INFO Next recovery check in 15s... 2017-03-30 00:47:32,586:fsmgmt.fsmgmt:DEBUG Recovery Result Record: {'soak-4': {'soaked-OST000f': 'RECOVERING', 'soaked-OST000e': 'INACTIVE', 'soaked-OST0008': 'INACTIVE', 'soaked-OST0009': 'RECOVERING', 'soaked-OST0002': 'INACTIVE', 'soaked-OST0003': 'RECOVERING', 'soaked-OST0015': 'COMPLETE', 'soaked-OST0014': 'INACTIVE'}}
Single client has LBUG, after recovering some partitions
Mar 30 00:47:33 soak-36 kernel: Lustre: soaked-OST0009-osc-ffff88085b72c000: Connection restored to 192.168.1.104@o2ib10 (at 192.168.1.104@o2ib10) .....:q Mar 30 00:48:31 soak-36 kernel: LustreError: 4753:0:(recover.c:157:ptlrpc_replay_next()) ASSERTION( !list_empty(&req->rq_cli.cr_unreplied_list) ) failed: Mar 30 00:48:31 soak-36 kernel: LustreError: 4753:0:(recover.c:157:ptlrpc_replay_next()) LBUG Mar 30 00:48:31 soak-36 kernel: Pid: 4753, comm: ptlrpcd_rcv Mar 30 00:48:31 soak-36 kernel: #012Call Trace: Mar 30 00:48:31 soak-36 kernel: [<ffffffffa092c7f3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa092c861>] lbug_with_loc+0x41/0xb0 [libcfs] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d31c87>] ptlrpc_replay_next+0x447/0x450 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d55682>] ptlrpc_import_recovery_state_machine+0x1d2/0xbc0 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d2a2ff>] ptlrpc_replay_interpret+0x17f/0x7d0 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d2e0b5>] ptlrpc_check_set.part.23+0x425/0x1dd0 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d2fabb>] ptlrpc_check_set+0x5b/0xe0 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d5bb8b>] ptlrpcd_check+0x4db/0x5d0 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d5bf3b>] ptlrpcd+0x2bb/0x560 [ptlrpc] Mar 30 00:48:31 soak-36 kernel: [<ffffffff810c4fd0>] ? default_wake_function+0x0/0x20 Mar 30 00:48:31 soak-36 kernel: [<ffffffffa0d5bc80>] ? ptlrpcd+0x0/0x560 [ptlrpc] Mar 30 00:48:32 soak-36 kernel: [<ffffffff810b064f>] kthread+0xcf/0xe0 Mar 30 00:48:32 soak-36 kernel: [<ffffffff810b0580>] ? kthread+0x0/0xe0 Mar 30 00:48:32 soak-36 kernel: [<ffffffff81696958>] ret_from_fork+0x58/0x90 Mar 30 00:48:32 soak-36 kernel: [<ffffffff810b0580>] ? kthread+0x0/0xe0 Mar 30 00:48:32 soak-36 kernel: Mar 30 00:48:32 soak-36 kernel: Kernel panic - not syncing: LBUG
vmcore-dmesg.txt attached. Full crash dump is available on soak-36
Patch landed for 2.10.