Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11483

replay-dual test_25: ofd_lvbo_init()) ASSERTION( env ) failed

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.12.0
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Oleg Drokin <green@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/c9780dee-cb39-11e8-ad90-52540065bddc

      test_25 failed with the following error:

      trevis-13vm6 crashed during replay-dual test_25
      [ 1322.092883] LustreError: Skipped 1 previous similar message
      [ 1322.119196] Lustre: lustre-OST0000: Connection restored to  (at 10.9.4.146@tcp)
      [ 1327.099384] LustreError: 137-5: lustre-OST0001_UUID: not available for connect from 10.9.4.152@tcp (no target). If you are running an HA pair check that the target is mounted on the other server.
      [ 1327.101153] LustreError: Skipped 5 previous similar messages
      [ 1327.124964] Lustre: lustre-OST0000: Connection restored to  (at 10.9.4.146@tcp)
      [ 1327.185451] Lustre: lustre-OST0000: Recovery over after 0:19, of 3 clients 3 recovered and 0 were evicted.
      [ 1327.188206] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: 
      [ 1327.189159] LustreError: 2588:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG
      [ 1327.189473] Lustre: lustre-OST0000: deleting orphan objects from 0x0:9699 to 0x0:9729
      [ 1327.190483] Pid: 2588, comm: tgt_recover_0 3.10.0-862.9.1.el7_lustre.x86_64 #1 SMP Thu Sep 13 05:07:47 UTC 2018
      [ 1327.191428] Call Trace:
      [ 1327.191691]  [<ffffffffc09937cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [ 1327.192374]  [<ffffffffc099387c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [ 1327.192994]  [<ffffffffc1196a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
      [ 1327.193598]  [<ffffffffc0d27b70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
      [ 1327.194512]  [<ffffffffc0cf9748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
      [ 1327.195228]  [<ffffffffc0d414ea>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
      [ 1327.195898]  [<ffffffffc0cff175>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
      [ 1327.196572]  [<ffffffffc0d006d9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
      [ 1327.197347]  [<ffffffffc0d00c96>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
      [ 1327.198080]  [<ffffffffc099ff30>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
      [ 1327.198793]  [<ffffffffc09a32c5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
      [ 1327.199533]  [<ffffffffc0d00cdc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
      [ 1327.200367]  [<ffffffffc0d1362a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
      [ 1327.201102]  [<ffffffff8e8bb621>] kthread+0xd1/0xe0
      [ 1327.201600]  [<ffffffff8ef205f7>] ret_from_fork_nospec_end+0x0/0x39
      [ 1327.202240]  [<ffffffffffffffff>] 0xffffffffffffffff
      [ 1327.202817] Kernel panic - not syncing: LBUG
      

      Seems to be introduced by the newly landed https://review.whamcloud.com/#/c/32832/

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      replay-dual test_25 - trevis-13vm6 crashed during replay-dual test_25

      Attachments

        Issue Links

          Activity

            [LU-11483] replay-dual test_25: ofd_lvbo_init()) ASSERTION( env ) failed
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33321/
            Subject: LU-11483 ldlm ofd_lvbo_init() and mdt_lvbo_fill() create env
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2339e1b3b69048d65eed1eaa46b307f9116300ee

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33321/ Subject: LU-11483 ldlm ofd_lvbo_init() and mdt_lvbo_fill() create env Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2339e1b3b69048d65eed1eaa46b307f9116300ee
            sarah Sarah Liu added a comment -

            Hit the same bug when running soak for about 24 hours on tag-2.11.56

            OSS console

            CentOS Linux 7 (Core)
            Kernel 3.10.0-862.14.4.el7_lustre.x86_64 on an x86_64
            
            soak-6 login: [  270.853311] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2
            [  270.864242] alg: No test for adler32 (adler32-zlib)
            [  271.762756] Lustre: Lustre: Build Version: 2.11.56_15_g70a01a6
            [  272.033075] LNet: Using FMR for registration
            [  272.050159] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180]
            [  279.577583] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900
            [  284.516536] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 29 clients reconnect
            [  284.527337] Lustre: soaked-OST0002: Connection restored to 192.168.1.109@o2ib (at 192.168.1.109@o2ib)
            [  284.537750] Lustre: Skipped 1 previous similar message
            [  285.208917] Lustre: soaked-OST0002: Connection restored to 7c9d2971-d34c-0d6b-546b-c6ac4b8c9bba (at 192.168.1.136@o2ib)
            [  285.221079] Lustre: Skipped 4 previous similar messages
            [  286.880038] Lustre: soaked-OST0002: Connection restored to a22d33ec-ba2b-7004-9946-ec38ad99c77c (at 192.168.1.125@o2ib)
            [  286.892196] Lustre: Skipped 8 previous similar messages
            [  288.958850] Lustre: soaked-OST0002: Connection restored to ed7247ba-e047-2382-fd0f-3accbe45b774 (at 192.168.1.123@o2ib)
            [  288.971007] Lustre: Skipped 10 previous similar messages
            [  289.442626] Lustre: soaked-OST0002: Recovery over after 0:05, of 29 clients 29 recovered and 0 were evicted.
            [  289.456061] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: 
            [  289.465251] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG
            [  289.471636] Lustre: soaked-OST0002: deleting orphan objects from 0x0:8573426 to 0x0:8573446
            [  289.475536] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:9764388 to 0x380000400:9764413
            [  289.480361] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:7271384 to 0x380000401:7271409
            [  289.484430] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:6323931 to 0x380000402:6323953
            [  289.514576] Pid: 14242, comm: tgt_recover_2 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018
            [  289.526098] Call Trace:
            [  289.526098] Call Trace:
            [  289.528865]  [<ffffffffc0a397cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
            [  289.536240]  [<ffffffffc0a3987c>] lbug_with_loc+0x4c/0xa0 [libcfs]
            [  289.543208]  [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
            [  289.550088]  [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
            [  289.558573]  [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
            [  289.566427]  [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
            [  289.573731]  [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
            [  289.581199]  [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
            [  289.589039]  [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
            [  289.596510]  [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
            [  289.604656]  [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
            [  289.612785]  [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
            [  289.621327]  [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
            [  289.629467]  [<ffffffff94cbdf21>] kthread+0xd1/0xe0
            [  289.634968]  [<ffffffff953255f7>] ret_from_fork_nospec_end+0x0/0x39
            [  289.642025]  [<ffffffffffffffff>] 0xffffffffffffffff
            [  289.647642] Kernel panic - not syncing: LBUG
            [  289.652420] CPU: 29 PID: 14242 Comm: tgt_recover_2 Tainted: P           OE  ------------   3.10.0-862.14.4.el7_lustre.x86_64 #1
            [  289.665288] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
            [  289.679115] Call Trace:
            [  289.682925]  [<ffffffff95313754>] dump_stack+0x19/0x1b
            [  289.689722]  [<ffffffff9530d29f>] panic+0xe8/0x21f
            [  289.696140]  [<ffffffffc0a398cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
            [  289.704101]  [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd]
            [  289.711978]  [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc]
            [  289.721401]  [<ffffffffc0f56670>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc]
            [  289.730805]  [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc]
            [  289.739637]  [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc]
            [  289.747895]  [<ffffffffc0b83011>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass]
            [  289.756896]  [<ffffffffc0f2e351>] ? ldlm_run_ast_work+0x1d1/0x3a0 [ptlrpc]
            [  289.765584]  [<ffffffff94dfbbfd>] ? kmem_cache_alloc_node_trace+0x11d/0x210
            [  289.774366]  [<ffffffffc0b82ee9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass]
            [  289.783337]  [<ffffffffc0f286a0>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc]
            [  289.792409]  [<ffffffffc0f66e60>] ? ptlrpc_prep_set+0xc0/0x260 [ptlrpc]
            [  289.800767]  [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc]
            [  289.809103]  [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc]
            [  289.817803]  [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc]
            [  289.826097]  [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs]
            [  289.835067]  [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
            [  289.844317]  [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc]
            [  289.853548]  [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs]
            [  289.862536]  [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc]
            [  289.871884]  [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc]
            [  289.880872]  [<ffffffffc0f41c90>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc]
            [  289.890894]  [<ffffffff94cbdf21>] kthread+0xd1/0xe0
            [  289.897249]  [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40
            [  289.904937]  [<ffffffff953255f7>] ret_from_fork_nospec_begin+0x21/0x21
            [  289.913075]  [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40
            [  289.920842] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
            
            sarah Sarah Liu added a comment - Hit the same bug when running soak for about 24 hours on tag-2.11.56 OSS console CentOS Linux 7 (Core) Kernel 3.10.0-862.14.4.el7_lustre.x86_64 on an x86_64 soak-6 login: [ 270.853311] LNet: HW NUMA nodes: 2, HW CPU cores: 32, npartitions: 2 [ 270.864242] alg: No test for adler32 (adler32-zlib) [ 271.762756] Lustre: Lustre: Build Version: 2.11.56_15_g70a01a6 [ 272.033075] LNet: Using FMR for registration [ 272.050159] LNet: Added LNI 192.168.1.106@o2ib [8/256/0/180] [ 279.577583] Lustre: soaked-OST0002: Imperative Recovery enabled, recovery window shrunk from 300-900 down to 150-900 [ 284.516536] Lustre: soaked-OST0002: Will be in recovery for at least 2:30, or until 29 clients reconnect [ 284.527337] Lustre: soaked-OST0002: Connection restored to 192.168.1.109@o2ib (at 192.168.1.109@o2ib) [ 284.537750] Lustre: Skipped 1 previous similar message [ 285.208917] Lustre: soaked-OST0002: Connection restored to 7c9d2971-d34c-0d6b-546b-c6ac4b8c9bba (at 192.168.1.136@o2ib) [ 285.221079] Lustre: Skipped 4 previous similar messages [ 286.880038] Lustre: soaked-OST0002: Connection restored to a22d33ec-ba2b-7004-9946-ec38ad99c77c (at 192.168.1.125@o2ib) [ 286.892196] Lustre: Skipped 8 previous similar messages [ 288.958850] Lustre: soaked-OST0002: Connection restored to ed7247ba-e047-2382-fd0f-3accbe45b774 (at 192.168.1.123@o2ib) [ 288.971007] Lustre: Skipped 10 previous similar messages [ 289.442626] Lustre: soaked-OST0002: Recovery over after 0:05, of 29 clients 29 recovered and 0 were evicted. [ 289.456061] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) ASSERTION( env ) failed: [ 289.465251] LustreError: 14242:0:(ofd_lvb.c:95:ofd_lvbo_init()) LBUG [ 289.471636] Lustre: soaked-OST0002: deleting orphan objects from 0x0:8573426 to 0x0:8573446 [ 289.475536] Lustre: soaked-OST0002: deleting orphan objects from 0x380000400:9764388 to 0x380000400:9764413 [ 289.480361] Lustre: soaked-OST0002: deleting orphan objects from 0x380000401:7271384 to 0x380000401:7271409 [ 289.484430] Lustre: soaked-OST0002: deleting orphan objects from 0x380000402:6323931 to 0x380000402:6323953 [ 289.514576] Pid: 14242, comm: tgt_recover_2 3.10.0-862.14.4.el7_lustre.x86_64 #1 SMP Fri Oct 12 14:51:33 UTC 2018 [ 289.526098] Call Trace: [ 289.526098] Call Trace: [ 289.528865] [<ffffffffc0a397cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [ 289.536240] [<ffffffffc0a3987c>] lbug_with_loc+0x4c/0xa0 [libcfs] [ 289.543208] [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd] [ 289.550088] [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc] [ 289.558573] [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 289.566427] [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc] [ 289.573731] [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 289.581199] [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc] [ 289.589039] [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 289.596510] [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 289.604656] [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 289.612785] [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 289.621327] [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc] [ 289.629467] [<ffffffff94cbdf21>] kthread+0xd1/0xe0 [ 289.634968] [<ffffffff953255f7>] ret_from_fork_nospec_end+0x0/0x39 [ 289.642025] [<ffffffffffffffff>] 0xffffffffffffffff [ 289.647642] Kernel panic - not syncing: LBUG [ 289.652420] CPU: 29 PID: 14242 Comm: tgt_recover_2 Tainted: P OE ------------ 3.10.0-862.14.4.el7_lustre.x86_64 #1 [ 289.665288] Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013 [ 289.679115] Call Trace: [ 289.682925] [<ffffffff95313754>] dump_stack+0x19/0x1b [ 289.689722] [<ffffffff9530d29f>] panic+0xe8/0x21f [ 289.696140] [<ffffffffc0a398cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [ 289.704101] [<ffffffffc1175a3b>] ofd_lvbo_init+0x70b/0x800 [ofd] [ 289.711978] [<ffffffffc0f56c70>] ldlm_server_completion_ast+0x600/0x9b0 [ptlrpc] [ 289.721401] [<ffffffffc0f56670>] ? ldlm_server_blocking_ast+0xa40/0xa40 [ptlrpc] [ 289.730805] [<ffffffffc0f28748>] ldlm_work_cp_ast_lock+0xa8/0x1d0 [ptlrpc] [ 289.739637] [<ffffffffc0f7061a>] ptlrpc_set_wait+0x7a/0x8d0 [ptlrpc] [ 289.747895] [<ffffffffc0b83011>] ? lprocfs_counter_sub+0xc1/0x130 [obdclass] [ 289.756896] [<ffffffffc0f2e351>] ? ldlm_run_ast_work+0x1d1/0x3a0 [ptlrpc] [ 289.765584] [<ffffffff94dfbbfd>] ? kmem_cache_alloc_node_trace+0x11d/0x210 [ 289.774366] [<ffffffffc0b82ee9>] ? lprocfs_counter_add+0xf9/0x160 [obdclass] [ 289.783337] [<ffffffffc0f286a0>] ? ldlm_work_gl_ast_lock+0x3a0/0x3a0 [ptlrpc] [ 289.792409] [<ffffffffc0f66e60>] ? ptlrpc_prep_set+0xc0/0x260 [ptlrpc] [ 289.800767] [<ffffffffc0f2e255>] ldlm_run_ast_work+0xd5/0x3a0 [ptlrpc] [ 289.809103] [<ffffffffc0f2f7b9>] __ldlm_reprocess_all+0x129/0x380 [ptlrpc] [ 289.817803] [<ffffffffc0f2fd76>] ldlm_reprocess_res+0x26/0x30 [ptlrpc] [ 289.826097] [<ffffffffc0a45e60>] cfs_hash_for_each_relax+0x250/0x450 [libcfs] [ 289.835067] [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 289.844317] [<ffffffffc0f2fd50>] ? ldlm_lock_mode_downgrade+0x320/0x320 [ptlrpc] [ 289.853548] [<ffffffffc0a491f5>] cfs_hash_for_each_nolock+0x75/0x1c0 [libcfs] [ 289.862536] [<ffffffffc0f2fdbc>] ldlm_reprocess_recovery_done+0x3c/0x110 [ptlrpc] [ 289.871884] [<ffffffffc0f4270a>] target_recovery_thread+0xa7a/0x1370 [ptlrpc] [ 289.880872] [<ffffffffc0f41c90>] ? replay_request_or_update.isra.23+0x8c0/0x8c0 [ptlrpc] [ 289.890894] [<ffffffff94cbdf21>] kthread+0xd1/0xe0 [ 289.897249] [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40 [ 289.904937] [<ffffffff953255f7>] ret_from_fork_nospec_begin+0x21/0x21 [ 289.913075] [<ffffffff94cbde50>] ? insert_kthread_work+0x40/0x40 [ 289.920842] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

            Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33321
            Subject: LU-11483 ofd: ofd_lvbo_init() to create env
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0a446a0c8b32d83b16b7729e074bd24be973ef26

            gerrit Gerrit Updater added a comment - Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33321 Subject: LU-11483 ofd: ofd_lvbo_init() to create env Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0a446a0c8b32d83b16b7729e074bd24be973ef26

            People

              bzzz Alex Zhuravlev
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: