[LU-11900] MDT hit (ldlm_lib.c:1595:target_finish_recovery()) LBUG during recovery Created: 29/Jan/19 Updated: 16/Jan/22 Resolved: 16/Jan/22 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sarah Liu | Assignee: | Mikhail Pershin |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: 2.10.6_34_gb5ad8a0 |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
1 MDT hit following LBUG during recovery on soak after running 15 hours, it looks like [ 1119.601893] sd 0:0:0:44: rdac: array soak-netapp5660-1, ctlr 0, queueing MODE_SELECT command [ 1120.258577] sd 0:0:0:44: rdac: array soak-netapp5660-1, ctlr 0, MODE_SELECT completed [ 1120.288628] Lustre: soaked-MDT0002: Recovery over after 3:19, of 28 clients 28 recovered and 57 were evicted. [ 1120.354278] LustreError: 14264:0:(ldlm_lib.c:1593:target_finish_recovery()) soaked-MDT0002: Recovery queues ( lock ) are not empty [ 1120.367393] LustreError: 14264:0:(ldlm_lib.c:1595:target_finish_recovery()) LBUG [ 1120.375656] Pid: 14264, comm: tgt_recover_2 3.10.0-957.el7_lustre.x86_64 #1 SMP Mon Jan 7 20:06:41 UTC 2019 [ 1120.386533] Call Trace: [ 1120.389272] [<ffffffffc09e67cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 1120.396592] [<ffffffffc09e687c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 1120.403512] [<ffffffffc0d35479>] target_recovery_thread+0x1359/0x1370 [ptlrpc] [ 1120.411766] [<ffffffffb00c1c31>] kthread+0xd1/0xe0 [ 1120.417229] [<ffffffffb0774c37>] ret_from_fork_nospec_end+0x0/0x39 [ 1120.424239] [<ffffffffffffffff>] 0xffffffffffffffff [ 1120.429817] Kernel panic - not syncing: LBUG [ 1120.434583] CPU: 27 PID: 14264 Comm: tgt_recover_2 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7_lustre.x86_64 #1 [ 1120.448260] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 1120.460781] Call Trace: [ 1120.463512] [<ffffffffb0761dc1>] dump_stack+0x19/0x1b [ 1120.469239] [<ffffffffb075b4d0>] panic+0xe8/0x21f [ 1120.474590] [<ffffffffc09e68cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 1120.481517] [<ffffffffc0d35479>] target_recovery_thread+0x1359/0x1370 [ptlrpc] [ 1120.489698] [<ffffffffc0d34120>] ? replay_request_or_update.isra.21+0x8c0/0x8c0 [ptlrpc] [ 1120.498846] [<ffffffffb00c1c31>] kthread+0xd1/0xe0 [ 1120.504291] [<ffffffffb00c1b60>] ? insert_kthread_work+0x40/0x40 [ 1120.511094] [<ffffffffb0774c37>] ret_from_fork_nospec_begin+0x21/0x21 [ 1120.518380] [<ffffffffb00c1b60>] ? insert_kthread_work+0x40/0x40 [ 0.000000] Initializing cgroup subsys cpuset |
| Comments |
| Comment by Peter Jones [ 30/Jan/19 ] |
|
Mike Could you please advise? Thanks Peter |
| Comment by Oleg Drokin [ 31/Jan/19 ] |
|
This is caused by https://review.whamcloud.com/#/c/33977/ landing. I pushed a revert to https://review.whamcloud.com/#/c/34149/ |
| Comment by Peter Jones [ 01/Feb/19 ] |
|
hongchao.zhang do you understand why this might affect only b2_10 not master? |
| Comment by Hongchao Zhang [ 01/Feb/19 ] |
|
the patch https://review.whamcloud.com/#/c/33977/ depends on the patch https://review.whamcloud.com/#/c/34027/ |
| Comment by Hongchao Zhang [ 23/Mar/19 ] |
|
the dependance patch https://review.whamcloud.com/#/c/34027/ has been landed on b2_10 |
| Comment by Peter Jones [ 23/Mar/19 ] |
|
I think that this can be closed as cannot repro now, right? |