[LU-11483] replay-dual test_25: ofd_lvbo_init()) ASSERTION( env ) failed Created: 08/Oct/18  Updated: 24/Sep/19  Resolved: 13/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: soak

Issue Links:
Duplicate
is duplicated by LU-11629 MDS panic under load - lu_context_key... Resolved
Related
is related to LU-12034 env allocation in ptlrpc_set_wait() c... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c9780dee-cb39-11e8-ad90-52540065bddc

test_25 failed with the following error:

trevis-13vm6 crashed during replay-dual test_25
[ 1322.092883] LustreError: Skipped 1 previous similar message
[ 1322.119196] Lustre: lustre-OST0000: Connection restored to  (at 10.9.4.146@tcp)
[ 1327.099384] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.152@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
[ 1327.101153] LustreError: Skipped 5 previous similar messages
[ 1327.124964] Lustre: lustre-OST0000: Connection restored to  (at 10.9.4.146@tcp)
[ 1327.185451] Lustre: lustre-OST0000: Recovery over after 0:19, of 3 clients 3 recovered and 0 were evicted.
[ 1327.188206] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: 
[ 1327.189159] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG
[ 1327.189473] Lustre: lustre-OST0000: deleting orphan objects from 0x0:9699 to 0x0:9729
[ 1327.190483] Pid: 2588, comm: tgt_recover_0 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Sep 13 05:07:47 UTC 2018
[ 1327.191428] Call Trace:
[ 1327.191691]  [<ffffffffc09937cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 1327.192374]  [<ffffffffc099387c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 1327.192994]  [<ffffffffc1196a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
[ 1327.193598]  [<ffffffffc0d27b70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
[ 1327.194512]  [<ffffffffc0cf9748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
[ 1327.195228]  [<ffffffffc0d414ea>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
[ 1327.195898]  [<ffffffffc0cff175>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[ 1327.196572]  [<ffffffffc0d006d9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
[ 1327.197347]  [<ffffffffc0d00c96>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
[ 1327.198080]  [<ffffffffc099ff30>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[ 1327.198793]  [<ffffffffc09a32c5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[ 1327.199533]  [<ffffffffc0d00cdc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[ 1327.200367]  [<ffffffffc0d1362a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
[ 1327.201102]  [<ffffffff8e8bb621>] kthread+0xd1/0xe0
[ 1327.201600]  [<ffffffff8ef205f7>] ret_from_fork_nospec_end+0x0/0x39
[ 1327.202240]  [<ffffffffffffffff>] 0xffffffffffffffff
[ 1327.202817] Kernel panic - not syncing: LBUG

Seems to be introduced by the newly landed https://review.whamcloud.com/#/c/32832/

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
replay-dual test_25 - trevis-13vm6 crashed during replay-dual test_25



 Comments   
Comment by Gerrit Updater [ 09/Oct/18 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33321
Subject: LU-11483 ofd: ofd_lvbo_init() to create env
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0a446a0c8b32d83b16b7729e074bd24be973ef26

Comment by Sarah Liu [ 30/Oct/18 ]

Hit the same bug when running soak for about 24 hours on tag-2.11.56

OSS console

CentOS Linux 7 (Core)
Kernel 3.10.0-862.14.4.el7_lustre.x86_64 on an x86_64

soak-6 login: [  270.853311] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2
[  270.864242] alg: No test for adler32 (adler32-zlib)
[  271.762756] Lustre: Lustre: Build Version: 2.11.56_15_g70a01a6
[  272.033075] LNet: Using FMR for registration
[  272.050159] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180]
[  279.577583] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
[  284.516536] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 29 clients reconnect
[  284.527337] Lustre: soaked-OST0002: Connection restored to 192.168.1.109@o2ib (at 192.168.1.109@o2ib)
[  284.537750] Lustre: Skipped 1 previous similar message
[  285.208917] Lustre: soaked-OST0002: Connection restored to 7c9d2971-d34c-0d6b-546b-c6ac4b8c9bba (at 192.168.1.136@o2ib)
[  285.221079] Lustre: Skipped 4 previous similar messages
[  286.880038] Lustre: soaked-OST0002: Connection restored to a22d33ec-ba2b-7004-9946-ec38ad99c77c (at 192.168.1.125@o2ib)
[  286.892196] Lustre: Skipped 8 previous similar messages
[  288.958850] Lustre: soaked-OST0002: Connection restored to ed7247ba-e047-2382-fd0f-3accbe45b774 (at 192.168.1.123@o2ib)
[  288.971007] Lustre: Skipped 10 previous similar messages
[  289.442626] Lustre: soaked-OST0002: Recovery over after 0:05, of 29 clients 29 recovered and 0 were evicted.
[  289.456061] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: 
[  289.465251] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG
[  289.471636] Lustre: soaked-OST0002: deleting orphan objects from 0x0:8573426 to 0x0:8573446
[  289.475536] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:9764388 to 0x380000400:9764413
[  289.480361] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:7271384 to 0x380000401:7271409
[  289.484430] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:6323931 to 0x380000402:6323953
[  289.514576] Pid: 14242, comm: tgt_recover_2 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018
[  289.526098] Call Trace:
[  289.526098] Call Trace:
[  289.528865]  [<ffffffffc0a397cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[  289.536240]  [<ffffffffc0a3987c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[  289.543208]  [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
[  289.550088]  [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
[  289.558573]  [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
[  289.566427]  [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
[  289.573731]  [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[  289.581199]  [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
[  289.589039]  [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
[  289.596510]  [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[  289.604656]  [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[  289.612785]  [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[  289.621327]  [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
[  289.629467]  [<ffffffff94cbdf21>] kthread+0xd1/0xe0
[  289.634968]  [<ffffffff953255f7>] ret_from_fork_nospec_end+0x0/0x39
[  289.642025]  [<ffffffffffffffff>] 0xffffffffffffffff
[  289.647642] Kernel panic - not syncing: LBUG
[  289.652420] CPU: 29 PID: 14242 Comm: tgt_recover_2 Tainted: P           OE  ------------   3.10.0-862.14.4.el7_lustre.x86_64 #1
[  289.665288] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[  289.679115] Call Trace:
[  289.682925]  [<ffffffff95313754>] dump_stack+0x19/0x1b
[  289.689722]  [<ffffffff9530d29f>] panic+0xe8/0x21f
[  289.696140]  [<ffffffffc0a398cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
[  289.704101]  [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
[  289.711978]  [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
[  289.721401]  [<ffffffffc0f56670>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc]
[  289.730805]  [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
[  289.739637]  [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
[  289.747895]  [<ffffffffc0b83011>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
[  289.756896]  [<ffffffffc0f2e351>] ? ldlm_run_ast_work+0x1d1/0x3a0 [ptlrpc]
[  289.765584]  [<ffffffff94dfbbfd>] ? kmem_cache_alloc_node_trace+0x11d/0x210
[  289.774366]  [<ffffffffc0b82ee9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
[  289.783337]  [<ffffffffc0f286a0>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc]
[  289.792409]  [<ffffffffc0f66e60>] ? ptlrpc_prep_set+0xc0/0x260 [ptlrpc]
[  289.800767]  [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
[  289.809103]  [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
[  289.817803]  [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
[  289.826097]  [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
[  289.835067]  [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
[  289.844317]  [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
[  289.853548]  [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
[  289.862536]  [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
[  289.871884]  [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
[  289.880872]  [<ffffffffc0f41c90>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc]
[  289.890894]  [<ffffffff94cbdf21>] kthread+0xd1/0xe0
[  289.897249]  [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40
[  289.904937]  [<ffffffff953255f7>] ret_from_fork_nospec_begin+0x21/0x21
[  289.913075]  [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40
[  289.920842] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Comment by Gerrit Updater [ 13/Nov/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33321/
Subject: LU-11483 ldlm ofd_lvbo_init() and mdt_lvbo_fill() create env
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2339e1b3b69048d65eed1eaa46b307f9116300ee

Comment by Peter Jones [ 13/Nov/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:44:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.