[LU-11483] replay-dual test_25: ofd_lvbo_init()) ASSERTION( env ) failed Created: 08/Oct/18 Updated: 24/Sep/19 Resolved: 13/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | soak | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
This issue was created by maloo for Oleg Drokin <green@whamcloud.com> This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c9780dee-cb39-11e8-ad90-52540065bddc test_25 failed with the following error: trevis-13vm6 crashed during replay-dual test_25 [ 1322.092883] LustreError: Skipped 1 previous similar message [ 1322.119196] Lustre: lustre-OST0000: Connection restored to (at 10.9.4.146@tcp) [ 1327.099384] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.152@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. [ 1327.101153] LustreError: Skipped 5 previous similar messages [ 1327.124964] Lustre: lustre-OST0000: Connection restored to (at 10.9.4.146@tcp) [ 1327.185451] Lustre: lustre-OST0000: Recovery over after 0:19, of 3 clients 3 recovered and 0 were evicted. [ 1327.188206] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: [ 1327.189159] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG [ 1327.189473] Lustre: lustre-OST0000: deleting orphan objects from 0x0:9699 to 0x0:9729 [ 1327.190483] Pid: 2588, comm: tgt_recover_0 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Sep 13 05:07:47 UTC 2018 [ 1327.191428] Call Trace: [ 1327.191691] [<ffffffffc09937cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 1327.192374] [<ffffffffc099387c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 1327.192994] [<ffffffffc1196a3b>] ofd_lvbo_init+0x70b/0x800 [ofd] [ 1327.193598] [<ffffffffc0d27b70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc] [ 1327.194512] [<ffffffffc0cf9748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 1327.195228] [<ffffffffc0d414ea>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc] [ 1327.195898] [<ffffffffc0cff175>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 1327.196572] [<ffffffffc0d006d9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc] [ 1327.197347] [<ffffffffc0d00c96>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 1327.198080] [<ffffffffc099ff30>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 1327.198793] [<ffffffffc09a32c5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 1327.199533] [<ffffffffc0d00cdc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 1327.200367] [<ffffffffc0d1362a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc] [ 1327.201102] [<ffffffff8e8bb621>] kthread+0xd1/0xe0 [ 1327.201600] [<ffffffff8ef205f7>] ret_from_fork_nospec_end+0x0/0x39 [ 1327.202240] [<ffffffffffffffff>] 0xffffffffffffffff [ 1327.202817] Kernel panic - not syncing: LBUG Seems to be introduced by the newly landed https://review.whamcloud.com/#/c/32832/ VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV |
| Comments |
| Comment by Gerrit Updater [ 09/Oct/18 ] |
|
Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33321 |
| Comment by Sarah Liu [ 30/Oct/18 ] |
|
Hit the same bug when running soak for about 24 hours on tag-2.11.56 OSS console CentOS Linux 7 (Core) Kernel 3.10.0-862.14.4.el7_lustre.x86_64 on an x86_64 soak-6 login: [ 270.853311] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2 [ 270.864242] alg: No test for adler32 (adler32-zlib) [ 271.762756] Lustre: Lustre: Build Version: 2.11.56_15_g70a01a6 [ 272.033075] LNet: Using FMR for registration [ 272.050159] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180] [ 279.577583] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 284.516536] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 29 clients reconnect [ 284.527337] Lustre: soaked-OST0002: Connection restored to 192.168.1.109@o2ib (at 192.168.1.109@o2ib) [ 284.537750] Lustre: Skipped 1 previous similar message [ 285.208917] Lustre: soaked-OST0002: Connection restored to 7c9d2971-d34c-0d6b-546b-c6ac4b8c9bba (at 192.168.1.136@o2ib) [ 285.221079] Lustre: Skipped 4 previous similar messages [ 286.880038] Lustre: soaked-OST0002: Connection restored to a22d33ec-ba2b-7004-9946-ec38ad99c77c (at 192.168.1.125@o2ib) [ 286.892196] Lustre: Skipped 8 previous similar messages [ 288.958850] Lustre: soaked-OST0002: Connection restored to ed7247ba-e047-2382-fd0f-3accbe45b774 (at 192.168.1.123@o2ib) [ 288.971007] Lustre: Skipped 10 previous similar messages [ 289.442626] Lustre: soaked-OST0002: Recovery over after 0:05, of 29 clients 29 recovered and 0 were evicted. [ 289.456061] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: [ 289.465251] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG [ 289.471636] Lustre: soaked-OST0002: deleting orphan objects from 0x0:8573426 to 0x0:8573446 [ 289.475536] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:9764388 to 0x380000400:9764413 [ 289.480361] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:7271384 to 0x380000401:7271409 [ 289.484430] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:6323931 to 0x380000402:6323953 [ 289.514576] Pid: 14242, comm: tgt_recover_2 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018 [ 289.526098] Call Trace: [ 289.526098] Call Trace: [ 289.528865] [<ffffffffc0a397cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 289.536240] [<ffffffffc0a3987c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 289.543208] [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd] [ 289.550088] [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc] [ 289.558573] [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 289.566427] [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc] [ 289.573731] [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 289.581199] [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc] [ 289.589039] [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 289.596510] [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 289.604656] [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 289.612785] [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 289.621327] [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc] [ 289.629467] [<ffffffff94cbdf21>] kthread+0xd1/0xe0 [ 289.634968] [<ffffffff953255f7>] ret_from_fork_nospec_end+0x0/0x39 [ 289.642025] [<ffffffffffffffff>] 0xffffffffffffffff [ 289.647642] Kernel panic - not syncing: LBUG [ 289.652420] CPU: 29 PID: 14242 Comm: tgt_recover_2 Tainted: P OE ------------ 3.10.0-862.14.4.el7_lustre.x86_64 #1 [ 289.665288] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 289.679115] Call Trace: [ 289.682925] [<ffffffff95313754>] dump_stack+0x19/0x1b [ 289.689722] [<ffffffff9530d29f>] panic+0xe8/0x21f [ 289.696140] [<ffffffffc0a398cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 289.704101] [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd] [ 289.711978] [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc] [ 289.721401] [<ffffffffc0f56670>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc] [ 289.730805] [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 289.739637] [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc] [ 289.747895] [<ffffffffc0b83011>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [ 289.756896] [<ffffffffc0f2e351>] ? ldlm_run_ast_work+0x1d1/0x3a0 [ptlrpc] [ 289.765584] [<ffffffff94dfbbfd>] ? kmem_cache_alloc_node_trace+0x11d/0x210 [ 289.774366] [<ffffffffc0b82ee9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 289.783337] [<ffffffffc0f286a0>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] [ 289.792409] [<ffffffffc0f66e60>] ? ptlrpc_prep_set+0xc0/0x260 [ptlrpc] [ 289.800767] [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 289.809103] [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc] [ 289.817803] [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 289.826097] [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 289.835067] [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 289.844317] [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 289.853548] [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 289.862536] [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 289.871884] [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc] [ 289.880872] [<ffffffffc0f41c90>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc] [ 289.890894] [<ffffffff94cbdf21>] kthread+0xd1/0xe0 [ 289.897249] [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40 [ 289.904937] [<ffffffff953255f7>] ret_from_fork_nospec_begin+0x21/0x21 [ 289.913075] [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40 [ 289.920842] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) |
| Comment by Gerrit Updater [ 13/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33321/ |
| Comment by Peter Jones [ 13/Nov/18 ] |
|
Landed for 2.12 |